1. 06 4月, 2019 5 次提交
    • A
      kprobes/x86: Blacklist non-attachable interrupt functions · d5813e77
      Andrea Righi 提交于
      [ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ]
      
      These interrupt functions are already non-attachable by kprobes.
      Blacklist them explicitly so that they can show up in
      /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
      additional information.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lkml.kernel.org/r/20181206095648.GA8249@DellSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d5813e77
    • R
      x86/build: Mark per-CPU symbols as absolute explicitly for LLD · 648b949b
      Rafael Ávila de Espíndola 提交于
      [ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ]
      
      Accessing per-CPU variables is done by finding the offset of the
      variable in the per-CPU block and adding it to the address of the
      respective CPU's block.
      
      Section 3.10.8 of ld.bfd's documentation states:
      
        For expressions involving numbers, relative addresses and absolute
        addresses, ld follows these rules to evaluate terms:
      
        Other binary operations, that is, between two relative addresses
        not in the same section, or between a relative address and an
        absolute address, first convert any non-absolute term to an
        absolute address before applying the operator."
      
      Note that LLVM's linker does not adhere to the GNU ld's implementation
      and as such requires implicitly-absolute terms to be explicitly marked
      as absolute in the linker script. If not, it fails currently with:
      
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
        Makefile:1040: recipe for target 'vmlinux' failed
      
      This is not a functional change for ld.bfd which converts the term to an
      absolute symbol anyways as specified above.
      
      Based on a previous submission by Tri Vo <trong@android.com>.
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la>
      [ Update commit message per Boris' and Michael's suggestions. ]
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      [ Massage commit message more, fix typos. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NDmitry Golovin <dima@golovin.in>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tri Vo <trong@android.com>
      Cc: dima@golovin.in
      Cc: morbo@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      648b949b
    • G
      x86/build: Specify elf_i386 linker emulation explicitly for i386 objects · d2053718
      George Rimar 提交于
      [ Upstream commit 927185c124d62a9a4d35878d7f6d432a166b74e3 ]
      
      The kernel uses the OUTPUT_FORMAT linker script command in it's linker
      scripts. Most of the time, the -m option is passed to the linker with
      correct architecture, but sometimes (at least for x86_64) the -m option
      contradicts the OUTPUT_FORMAT directive.
      
      Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
      files, but are linked with the -m elf_x86_64 linker flag when building
      for x86_64.
      
      The GNU linker manpage doesn't explicitly state any tie-breakers between
      -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
      overrides the emulation value specified with the -m option.
      
      LLVM lld has a different behavior, however. When supplied with
      contradicting -m and OUTPUT_FORMAT values it fails with the following
      error message:
      
        ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64
      
      Therefore, just add the correct -m after the incorrect one (it overrides
      it), so the linker invocation looks like this:
      
        ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
          realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf
      
      This is not a functional change for GNU ld, because (although not
      explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.
      
      Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
      it in QEMU.
      
       [ bp: massage and clarify text. ]
      Suggested-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NGeorge Rimar <grimar@accesssoftek.com>
      Signed-off-by: NTri Vo <trong@android.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NTri Vo <trong@android.com>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Matz <matz@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: morbo@google.com
      Cc: ndesaulniers@google.com
      Cc: ruiu@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      d2053718
    • M
      perf/aux: Make perf_event accessible to setup_aux() · efd85d83
      Mathieu Poirier 提交于
      [ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]
      
      When pmu::setup_aux() is called the coresight PMU needs to know which
      sink to use for the session by looking up the information in the
      event's attr::config2 field.
      
      As such simply replace the cpu information by the complete perf_event
      structure and change all affected customers.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efd85d83
    • K
      x86/hyperv: Fix kernel panic when kexec on HyperV · c3f28d59
      Kairui Song 提交于
      [ Upstream commit 179fb36abb097976997f50733d5b122a29158cba ]
      
      After commit 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments"),
      kexec fails with a kernel panic:
      
      kexec_core: Starting new kernel
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
      RIP: 0010:0xffffc9000001d000
      
      Call Trace:
       ? __send_ipi_mask+0x1c6/0x2d0
       ? hv_send_ipi_mask_allbutself+0x6d/0xb0
       ? mp_save_irq+0x70/0x70
       ? __ioapic_read_entry+0x32/0x50
       ? ioapic_read_entry+0x39/0x50
       ? clear_IO_APIC_pin+0xb8/0x110
       ? native_stop_other_cpus+0x6e/0x170
       ? native_machine_shutdown+0x22/0x40
       ? kernel_kexec+0x136/0x156
      
      That happens if hypercall based IPIs are used because the hypercall page is
      reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
      which invokes the hypercall and dereferences the unusable page.
      
      To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
      any misuse, IPI sending will fall back to the non hypercall based
      method. This only happens on kexec / kdump so just setting the pointer to
      NULL is good enough.
      
      Fixes: 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments")
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: devel@linuxdriverproject.org
      Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      c3f28d59
  2. 03 4月, 2019 3 次提交
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 3a18eaba
      Sean Christopherson 提交于
      commit 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 upstream.
      
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a18eaba
    • S
      KVM: x86: update %rip after emulating IO · b9733a74
      Sean Christopherson 提交于
      commit 45def77ebf79e2e8942b89ed79294d97ce914fa0 upstream.
      
      Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
      OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
      vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
      that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
      
      To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
      Skip pio instruction when it is emulated, not executed") changed the
      behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
      instruction prior to exiting to userspace.  Full emulation doesn't need
      such tricks becase re-emulating the instruction will naturally handle
      %rip being changed to point at the reset vector.
      
      Updating %rip prior to executing to userspace has several drawbacks:
      
        - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
          fails it will likely yell about the wrong address.
        - Single step exits to userspace for are effectively dropped as
          KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
        - Behavior of PIO emulation is different depending on whether it
          goes down the fast path or the slow path.
      
      Rather than skip the PIO instruction before exiting to userspace,
      snapshot the linear %rip and cancel PIO completion if the current
      value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
      common scenario, the snapshot and comparison has negligible overhead
      as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
      VMREAD in this case.
      
      All other alternatives to snapshotting the linear %rip that don't
      rely on an explicit reset announcenment suffer from one corner case
      or another.  For example, canceling PIO completion on any write to
      %rip fails if userspace does a save/restore of %rip, and attempting to
      avoid that issue by canceling PIO only if %rip changed then fails if PIO
      collides with the reset %rip.  Attempting to zero in on the exact reset
      vector won't work for APs, which means adding more hooks such as the
      vCPU's MP_STATE, and so on and so forth.
      
      Checking for a linear %rip match technically suffers from corner cases,
      e.g. userspace could theoretically rewrite the underlying code page and
      expect a different instruction to execute, or the guest hardcodes a PIO
      reset at 0xfffffff0, but those are far, far outside of what can be
      considered normal operation.
      
      Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9733a74
    • T
      x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y · a0713e81
      Thomas Gleixner 提交于
      commit bebd024e4815b1a170fcd21ead9c2222b23ce9e6 upstream.
      
      The SMT disable 'nosmt' command line argument is not working properly when
      CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
      required to be brought up due to the MCE issues, cannot work. The CPUs are
      then kept in a half dead state.
      
      As the 'nosmt' functionality has become popular due to the speculative
      hardware vulnerabilities, the half torn down state is not a proper solution
      to the problem.
      
      Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
      possible.
      Reported-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mukesh Ojha <mojha@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0713e81
  3. 27 3月, 2019 2 次提交
    • J
      x86/unwind: Add hardcoded ORC entry for NULL · 206a76a6
      Jann Horn 提交于
      commit ac5ceccce5501e43d217c596e4ee859f2a3fef79 upstream.
      
      When the ORC unwinder is invoked for an oops caused by IP==0,
      it currently has no idea what to do because there is no debug information
      for the stack frame of NULL.
      
      But if RIP is NULL, it is very likely that the last successfully executed
      instruction was an indirect CALL/JMP, and it is possible to unwind out in
      the same way as for the first instruction of a normal function. Hardcode
      a corresponding ORC entry.
      
      With an artificially-added NULL call in prctl_set_seccomp(), before this
      patch, the trace is:
      
      Call Trace:
       ? __x64_sys_prctl+0x402/0x680
       ? __ia32_sys_prctl+0x6e0/0x6e0
       ? __do_page_fault+0x457/0x620
       ? do_syscall_64+0x6d/0x160
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      After this patch, the trace looks like this:
      
      Call Trace:
       __x64_sys_prctl+0x402/0x680
       ? __ia32_sys_prctl+0x6e0/0x6e0
       ? __do_page_fault+0x457/0x620
       do_syscall_64+0x6d/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      prctl_set_seccomp() still doesn't show up in the trace because for some
      reason, tail call optimization is only disabled in builds that use the
      frame pointer unwinder.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: linux-kbuild@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      206a76a6
    • J
      x86/unwind: Handle NULL pointer calls better in frame unwinder · 367ccafb
      Jann Horn 提交于
      commit f4f34e1b82eb4219d8eaa1c7e2e17ca219a6a2b5 upstream.
      
      When the frame unwinder is invoked for an oops caused by a call to NULL, it
      currently skips the parent function because BP still points to the parent's
      stack frame; the (nonexistent) current function only has the first half of
      a stack frame, and BP doesn't point to it yet.
      
      Add a special case for IP==0 that calculates a fake BP from SP, then uses
      the real BP for the next frame.
      
      Note that this handles first_frame specially: Return information about the
      parent function as long as the saved IP is >=first_frame, even if the fake
      BP points below it.
      
      With an artificially-added NULL call in prctl_set_seccomp(), before this
      patch, the trace is:
      
      Call Trace:
       ? prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      After this patch, the trace is:
      
      Call Trace:
       prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: linux-kbuild@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      367ccafb
  4. 24 3月, 2019 13 次提交
    • S
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 5ffb710b
      Sean Christopherson 提交于
      commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream.
      
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ffb710b
    • S
      KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 29b515c2
      Sean Christopherson 提交于
      commit 8570f9e881e3fde98801bb3a47eef84dd934d405 upstream.
      
      The address size of an instruction affects the effective address, not
      the virtual/linear address.  The final address may still be truncated,
      e.g. to 32-bits outside of long mode, but that happens irrespective of
      the address size, e.g. a 32-bit address size can yield a 64-bit virtual
      address when using FS/GS with a non-zero base.
      
      Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29b515c2
    • S
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9ce0ffeb
      Sean Christopherson 提交于
      commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream.
      
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instruction’s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ce0ffeb
    • S
      KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux · c235af5a
      Sean Christopherson 提交于
      commit ddfd1730fd829743e41213e32ccc8b4aa6dc8325 upstream.
      
      When installing new memslots, KVM sets bit 0 of the generation number to
      indicate that an update is in-progress.  Until the update is complete,
      there are no guarantees as to whether a vCPU will see the old or the new
      memslots.  Explicity prevent caching MMIO accesses so as to avoid using
      an access cached from the old memslots after the new memslots have been
      installed.
      
      Note that it is unclear whether or not disabling caching during the
      update window is strictly necessary as there is no definitive
      documentation as to what ordering guarantees KVM provides with respect
      to updating memslots.  That being said, the MMIO spte code does not
      allow reusing sptes created while an update is in-progress, and the
      associated documentation explicitly states:
      
          We do not want to use an MMIO sptes created with an odd generation
          number, ...  If KVM is unlucky and creates an MMIO spte while the
          low bit is 1, the next access to the spte will always be a cache miss.
      
      At the very least, disabling the per-vCPU MMIO cache during updates will
      make its behavior consistent with the MMIO spte behavior and
      documentation.
      
      Fixes: 56f17dd3 ("kvm: x86: fix stale mmio cache bug")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c235af5a
    • S
      KVM: x86/mmu: Detect MMIO generation wrap in any address space · 656e9e5d
      Sean Christopherson 提交于
      commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream.
      
      The check to detect a wrap of the MMIO generation explicitly looks for a
      generation number of zero.  Now that unique memslots generation numbers
      are assigned to each address space, only address space 0 will get a
      generation number of exactly zero when wrapping.  E.g. when address
      space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
      wrap to 0x2.  Adjust the MMIO generation to strip the address space
      modifier prior to checking for a wrap.
      
      Fixes: 4bd518f1 ("KVM: use separate generations for each address space")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      656e9e5d
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a
      Sean Christopherson 提交于
      commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.
      
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23ad135a
    • K
      perf/x86/intel/uncore: Fix client IMC events return huge result · 30cedf18
      Kan Liang 提交于
      commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.
      
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30cedf18
    • M
      x86/kprobes: Prohibit probing on optprobe template code · ee7d297f
      Masami Hiramatsu 提交于
      commit 0192e6535ebe9af68614198ced4fd6d37b778ebf upstream.
      
      Prohibit probing on optprobe template code, since it is not
      a code but a template instruction sequence. If we modify
      this template, copied template must be broken.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 9326638c ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation")
      Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee7d297f
    • J
      xen: fix dom0 boot on huge systems · 468ff43f
      Juergen Gross 提交于
      commit 01bd2ac2f55a1916d81dace12fa8d7ae1c79b5ea upstream.
      
      Commit f7c90c2a ("x86/xen: don't write ptes directly in 32-bit
      PV guests") introduced a regression for booting dom0 on huge systems
      with lots of RAM (in the TB range).
      
      Reason is that on those hosts the p2m list needs to be moved early in
      the boot process and this requires temporary page tables to be created.
      Said commit modified xen_set_pte_init() to use a hypercall for writing
      a PTE, but this requires the page table being in the direct mapped
      area, which is not the case for the temporary page tables used in
      xen_relocate_p2m().
      
      As the page tables are completely written before being linked to the
      actual address space instead of set_pte() a plain write to memory can
      be used in xen_relocate_p2m().
      
      Fixes: f7c90c2a ("x86/xen: don't write ptes directly in 32-bit PV guests")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      468ff43f
    • E
      crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP · 66700c89
      Eric Biggers 提交于
      commit 2060e284e9595fc3baed6e035903c05b93266555 upstream.
      
      The x86 MORUS implementations all fail the improved AEAD tests because
      they produce the wrong result with some data layouts.  The issue is that
      they assume that if the skcipher_walk API gives 'nbytes' not aligned to
      the walksize (a.k.a. walk.stride), then it is the end of the data.  In
      fact, this can happen before the end.
      
      Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
      incorrectly sleep in the skcipher_walk_*() functions while preemption
      has been disabled by kernel_fpu_begin().
      
      Fix these bugs.
      
      Fixes: 56e8e57f ("crypto: morus - Add common SIMD glue code for MORUS")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66700c89
    • E
      crypto: x86/aesni-gcm - fix crash on empty plaintext · 8a9fcf4a
      Eric Biggers 提交于
      commit 3af349639597fea582a93604734d717e59a0e223 upstream.
      
      gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
      scatterwalk_ffwd() when encrypting an empty plaintext and the source
      scatterlist ends immediately after the associated data.
      
      Fix it by only fast-forwarding to the src/dst data scatterlists if the
      data length is nonzero.
      
      This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
      with the new AEAD test manager.
      
      Fixes: e8455207 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather")
      Cc: <stable@vger.kernel.org> # v4.17+
      Cc: Dave Watson <davejwatson@fb.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a9fcf4a
    • E
      crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP · 5d2a5172
      Eric Biggers 提交于
      commit ba6771c0a0bc2fac9d6a8759bab8493bd1cffe3b upstream.
      
      The x86 AEGIS implementations all fail the improved AEAD tests because
      they produce the wrong result with some data layouts.  The issue is that
      they assume that if the skcipher_walk API gives 'nbytes' not aligned to
      the walksize (a.k.a. walk.stride), then it is the end of the data.  In
      fact, this can happen before the end.
      
      Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
      incorrectly sleep in the skcipher_walk_*() functions while preemption
      has been disabled by kernel_fpu_begin().
      
      Fix these bugs.
      
      Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d2a5172
    • R
      x86/CPU: Add Icelake model number · a9503ade
      Rajneesh Bhardwaj 提交于
      [ Upstream commit 8cd8f0ce0d6aafe661cb3d6781c8b82bc696c04d ]
      
      Add the CPUID model number of Icelake (ICL) mobile processors to the
      Intel family list. Icelake U/Y series uses model number 0x7E.
      Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David E. Box" <david.e.box@intel.com>
      Cc: dvhart@infradead.org
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: platform-driver-x86@vger.kernel.org
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190214115712.19642-2-rajneesh.bhardwaj@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      a9503ade
  5. 19 3月, 2019 3 次提交
  6. 14 3月, 2019 9 次提交
    • P
      perf/x86/intel: Implement support for TSX Force Abort · 26b6e018
      Peter Zijlstra (Intel) 提交于
      commit 400816f60c543153656ac74eaf7f36f6b7202378 upstream
      
      Skylake (and later) will receive a microcode update to address a TSX
      errata. This microcode will, on execution of a TSX instruction
      (speculative or not) use (clobber) PMC3. This update will also provide
      a new MSR to change this behaviour along with a CPUID bit to enumerate
      the presence of this new MSR.
      
      When the MSR gets set; the microcode will no longer use PMC3 but will
      Force Abort every TSX transaction (upon executing COMMIT).
      
      When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
      PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
      unused.
      
      When TFA is not allowed; clear PMC3 from all constraints such that it
      will not get used.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26b6e018
    • P
      x86: Add TSX Force Abort CPUID/MSR · fdd82094
      Peter Zijlstra (Intel) 提交于
      commit 52f64909409c17adf54fcf5f9751e0544ca3a6b4 upstream
      
      Skylake systems will receive a microcode update to address a TSX
      errata. This microcode will (by default) clobber PMC3 when TSX
      instructions are (speculatively or not) executed.
      
      It also provides an MSR to cause all TSX transaction to abort and
      preserve PMC3.
      
      Add the CPUID enumeration and MSR definition.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdd82094
    • P
      perf/x86/intel: Generalize dynamic constraint creation · 9e071aa6
      Peter Zijlstra (Intel) 提交于
      commit 11f8b2d65ca9029591c8df26bb6bd063c312b7fe upstream
      
      Such that we can re-use it.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e071aa6
    • P
      perf/x86/intel: Make cpuc allocations consistent · f99f7dae
      Peter Zijlstra (Intel) 提交于
      commit d01b1f96a82e5dd7841a1d39db3abfdaf95f70ab upstream
      
      The cpuc data structure allocation is different between fake and real
      cpuc's; use the same code to init/free both.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f99f7dae
    • A
      x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace Hub · 36e3673d
      Alexander Shishkin 提交于
      commit 2e095ce7b6ecce2f3e2ff330527f12056ed1e1a1 upstream.
      
      On Denverton's integration of the Intel(R) Trace Hub (for a reference and
      overview see Documentation/trace/intel_th.rst) the reported size of one of
      its resources (RTIT_BAR) doesn't match its actual size, which leads to
      overlaps with other devices' resources.
      
      In practice, it overlaps with XHCI MMIO space, which results in the xhci
      driver bailing out after seeing its registers as 0xffffffff, and perceived
      disappearance of all USB devices:
      
        intel_th_pci 0000:00:1f.7: enabling device (0004 -> 0006)
        xhci_hcd 0000:00:15.0: xHCI host controller not responding, assume dead
        xhci_hcd 0000:00:15.0: xHC not responding in xhci_irq, assume controller is dead
        xhci_hcd 0000:00:15.0: HC died; cleaning up
        usb 1-1: USB disconnect, device number 2
      
      For this reason, we need to resize the RTIT_BAR on Denverton to its actual
      size, which in this case is 4MB.  The corresponding erratum is DNV36 at the
      link below:
      
        DNV36.       Processor Host Root Complex May Incorrectly Route Memory
                     Accesses to Intel® Trace Hub
      
        Problem:     The Intel® Trace Hub RTIT_BAR (B0:D31:F7 offset 20h) is
      	       reported as a 2KB memory range.  Due to this erratum, the
      	       processor Host Root Complex will forward addresses from
      	       RTIT_BAR to RTIT_BAR + 4MB -1 to Intel® Trace Hub.
      
        Implication: Devices assigned within the RTIT_BAR to RTIT_BAR + 4MB -1
                     space may not function correctly.
      
        Workaround:  A BIOS code change has been identified and may be
                     implemented as a workaround for this erratum.
      
        Status:      No Fix.
      
      Note that 5118ccd3 ("intel_th: pci: Add Denverton SOC support") updates
      the Trace Hub driver so it claims the Denverton device, but the resource
      overlap exists regardless of whether that driver is loaded or that commit
      is included.
      
      Link: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c3000-family-spec-update.pdfSigned-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      [bhelgaas: include erratum text, clarify relationship with 5118ccd3]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36e3673d
    • Q
      x86_64: increase stack size for KASAN_EXTRA · 5edeae21
      Qian Cai 提交于
      [ Upstream commit a8e911d13540487942d53137c156bd7707f66e5d ]
      
      If the kernel is configured with KASAN_EXTRA, the stack size is
      increasted significantly because this option sets "-fstack-reuse" to
      "none" in GCC [1].  As a result, it triggers stack overrun quite often
      with 32k stack size compiled using GCC 8.  For example, this reproducer
      
        https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c
      
      triggers a "corrupted stack end detected inside scheduler" very reliably
      with CONFIG_SCHED_STACK_END_CHECK enabled.
      
      There are just too many functions that could have a large stack with
      KASAN_EXTRA due to large local variables that have been called over and
      over again without being able to reuse the stacks.  Some noticiable ones
      are
      
        size
        7648 shrink_page_list
        3584 xfs_rmap_convert
        3312 migrate_page_move_mapping
        3312 dev_ethtool
        3200 migrate_misplaced_transhuge_page
        3168 copy_process
      
      There are other 49 functions are over 2k in size while compiling kernel
      with "-Wframe-larger-than=" even with a related minimal config on this
      machine.  Hence, it is too much work to change Makefiles for each object
      to compile without "-fsanitize-address-use-after-scope" individually.
      
      [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23
      
      Although there is a patch in GCC 9 to help the situation, GCC 9 probably
      won't be released in a few months and then it probably take another
      6-month to 1-year for all major distros to include it as a default.
      Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
      when GCC 9 is everywhere.  Until then, this patch will help users avoid
      stack overrun.
      
      This has already been fixed for arm64 for the same reason via
      6e8830674ea ("arm64: kasan: Increase stack size for KASAN_EXTRA").
      
      Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5edeae21
    • K
      x86/kexec: Don't setup EFI info if EFI runtime is not enabled · e2b45446
      Kairui Song 提交于
      [ Upstream commit 2aa958c99c7fd3162b089a1a56a34a0cdb778de1 ]
      
      Kexec-ing a kernel with "efi=noruntime" on the first kernel's command
      line causes the following null pointer dereference:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        #PF error: [normal kernel read fault]
        Call Trace:
         efi_runtime_map_copy+0x28/0x30
         bzImage64_load+0x688/0x872
         arch_kexec_kernel_image_load+0x6d/0x70
         kimage_file_alloc_init+0x13e/0x220
         __x64_sys_kexec_file_load+0x144/0x290
         do_syscall_64+0x55/0x1a0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Just skip the EFI info setup if EFI runtime services are not enabled.
      
       [ bp: Massage commit message. ]
      Suggested-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: bhe@redhat.com
      Cc: David Howells <dhowells@redhat.com>
      Cc: erik.schmauss@intel.com
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: lenb@kernel.org
      Cc: linux-acpi@vger.kernel.org
      Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
      Cc: rafael.j.wysocki@intel.com
      Cc: robert.moore@intel.com
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yannik Sembritzki <yannik@sembritzki.me>
      Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      e2b45446
    • T
      x86/microcode/amd: Don't falsely trick the late loading mechanism · f964a4d2
      Thomas Lendacky 提交于
      [ Upstream commit 912139cfbfa6a2bc1da052314d2c29338dae1f6a ]
      
      The load_microcode_amd() function searches for microcode patches and
      attempts to apply a microcode patch if it is of different level than the
      currently installed level.
      
      While the processor won't actually load a level that is less than
      what is already installed, the logic wrongly returns UCODE_NEW thus
      signaling to its caller reload_store() that a late loading should be
      attempted.
      
      If the file-system contains an older microcode revision than what is
      currently running, such a late microcode reload can result in these
      misleading messages:
      
        x86/CPU: CPU features have changed after loading microcode, but might not take effect.
        x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
      
      These messages were issued on a system where SME/SEV are not
      enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot,
      early_detect_mem_encrypt() is called and cleared the SME and SEV
      features in this case.
      
      However, after the wrong late load attempt, get_cpu_cap() is called and
      reloads the SME and SEV feature bits, resulting in the messages.
      
      Update the microcode level check to not attempt microcode loading if the
      current level is greater than(!) and not only equal to the current patch
      level.
      
       [ bp: massage commit message. ]
      
      Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.netSigned-off-by: NSasha Levin <sashal@kernel.org>
      f964a4d2
    • W
      x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode · 15fb5d73
      Wei Huang 提交于
      [ Upstream commit b677dfae5aa197afc5191755a76a8727ffca538a ]
      
      In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
      when the hypervsior detects that the guest sets CR0.PG to 0. This causes
      the guest OS to reboot when it tries to return from 32-bit trampoline code
      because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
      EFER.LME=0.  As a precaution, set EFER.LME=1 as part of long mode
      activation procedure. This extra step won't cause any harm when Linux is
      booted on a bare-metal machine.
      Signed-off-by: NWei Huang <wei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      15fb5d73
  7. 10 3月, 2019 2 次提交
  8. 06 3月, 2019 3 次提交