1. 31 5月, 2019 17 次提交
    • Y
      x86/mce: Handle varying MCA bank counts · 75a96196
      Yazen Ghannam 提交于
      [ Upstream commit 006c077041dc73b9490fffc4c6af5befe0687110 ]
      
      Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a
      CPU. Currently, this number is the same for all CPUs and a warning is
      shown if there is a difference. The number of banks is overwritten with
      the MCG_CAP[Count] value of each following CPU that boots.
      
      According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives
      the number of banks that are available to a "processor implementation".
      The AMD BKDGs/PPRs further clarify that this value is per core. This
      value has historically been the same for every core in the system, but
      that is not an architectural requirement.
      
      Future AMD systems may have different MCG_CAP[Count] values per core,
      so the assumption that all CPUs will have the same MCG_CAP[Count] value
      will no longer be valid.
      
      Also, the first CPU to boot will allocate the struct mce_banks[] array
      using the number of banks based on its MCG_CAP[Count] value. The machine
      check handler and other functions use the global number of banks to
      iterate and index into the mce_banks[] array. So it's possible to use an
      out-of-bounds index on an asymmetric system where a following CPU sees a
      MCG_CAP[Count] value greater than its predecessors.
      
      Thus, allocate the mce_banks[] array to the maximum number of banks.
      This will avoid the potential out-of-bounds index since the value of
      mca_cfg.banks is capped to MAX_NR_BANKS.
      
      Set the value of mca_cfg.banks equal to the max of the previous value
      and the value for the current CPU. This way mca_cfg.banks will always
      represent the max number of banks detected on any CPU in the system.
      
      This will ensure that all CPUs will access all the banks that are
      visible to them. A CPU that can access fewer than the max number of
      banks will find the registers of the extra banks to be read-as-zero.
      
      Furthermore, print the resulting number of MCA banks in use. Do this in
      mcheck_late_init() so that the final value is printed after all CPUs
      have been initialized.
      
      Finally, get bank count from target CPU when doing injection with mce-inject
      module.
      
       [ bp: Remove out-of-bounds example, passify and cleanup commit message. ]
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      75a96196
    • T
      x86/mce: Fix machine_check_poll() tests for error types · 3d036cba
      Tony Luck 提交于
      [ Upstream commit f19501aa07f18268ab14f458b51c1c6b7f72a134 ]
      
      There has been a lurking "TBD" in the machine check poll routine ever
      since it was first split out from the machine check handler. The
      potential issue is that the poll routine may have just begun a read from
      the STATUS register in a machine check bank when the hardware logs an
      error in that bank and signals a machine check.
      
      That race used to be pretty small back when machine checks were
      broadcast, but the addition of local machine check means that the poll
      code could continue running and clear the error from the bank before the
      local machine check handler on another CPU gets around to reading it.
      
      Fix the code to be sure to only process errors that need to be processed
      in the poll code, leaving other logged errors alone for the machine
      check handler to find and process.
      
       [ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ]
      
      Fixes: b79109c3 ("x86, mce: separate correct machine check poller and fatal exception handler")
      Fixes: ed7290d0 ("x86, mce: implement new status bits")
      Reported-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-deskSigned-off-by: NSasha Levin <sashal@kernel.org>
      3d036cba
    • P
      x86/uaccess: Fix up the fixup · fc242af8
      Peter Zijlstra 提交于
      [ Upstream commit b69656fa7ea2f75e47d7bd5b9430359fa46488af ]
      
      New tooling got confused about this:
      
        arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x7: return with UACCESS enabled
      
      While the code isn't wrong, it is tedious (if at all possible) to
      figure out what function a particular chunk of .fixup belongs to.
      
      This then confuses the objtool uaccess validation. Instead of
      returning directly from the .fixup, jump back into the right function.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fc242af8
    • P
      x86/ia32: Fix ia32_restore_sigcontext() AC leak · 5007453c
      Peter Zijlstra 提交于
      [ Upstream commit 67a0514afdbb8b2fc70b771b8c77661a9cb9d3a9 ]
      
      Objtool spotted that we call native_load_gs_index() with AC set.
      Re-arrange the code to avoid that.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5007453c
    • P
      x86/uaccess, signal: Fix AC=1 bloat · 4614b0bb
      Peter Zijlstra 提交于
      [ Upstream commit 88e4718275c1bddca6f61f300688b4553dc8584b ]
      
      Occasionally GCC is less agressive with inlining and the following is
      observed:
      
        arch/x86/kernel/signal.o: warning: objtool: restore_sigcontext()+0x3cc: call to force_valid_ss.isra.5() with UACCESS enabled
        arch/x86/kernel/signal.o: warning: objtool: do_signal()+0x384: call to frame_uc_flags.isra.0() with UACCESS enabled
      
      Cure this by moving this code out of the AC=1 region, since it really
      isn't needed for the user access.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4614b0bb
    • K
      x86/build: Keep local relocations with ld.lld · e758471b
      Kees Cook 提交于
      [ Upstream commit 7c21383f3429dd70da39c0c7f1efa12377a47ab6 ]
      
      The LLVM linker (ld.lld) defaults to removing local relocations, which
      causes KASLR boot failures. ld.bfd and ld.gold already handle this
      correctly. This adds the explicit instruction "--discard-none" during
      the link phase. There is no change in output for ld.bfd and ld.gold,
      but ld.lld now produces an image with all the needed relocations.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: clang-built-linux@googlegroups.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190404214027.GA7324@beast
      Link: https://github.com/ClangBuiltLinux/linux/issues/404Signed-off-by: NSasha Levin <sashal@kernel.org>
      e758471b
    • B
      x86/microcode: Fix the ancient deprecated microcode loading method · a07de9b9
      Borislav Petkov 提交于
      [ Upstream commit 24613a04ad1c0588c10f4b5403ca60a73d164051 ]
      
      Commit
      
        2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      
      added the new define UCODE_NEW to denote that an update should happen
      only when newer microcode (than installed on the system) has been found.
      
      But it missed adjusting that for the old /dev/cpu/microcode loading
      interface. Fix it.
      
      Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jann Horn <jannh@google.com>
      Link: https://lkml.kernel.org/r/20190405133010.24249-3-bp@alien8.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      a07de9b9
    • K
      perf/x86/intel/cstate: Add Icelake support · 1cd4902d
      Kan Liang 提交于
      [ Upstream commit f08c47d1f86c6dc666c7e659d94bf6d4492aa9d7 ]
      
      Icelake uses the same C-state residency events as Sandy Bridge.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-10-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1cd4902d
    • K
      perf/x86/intel/rapl: Add Icelake support · ea6ff1bb
      Kan Liang 提交于
      [ Upstream commit b3377c3acb9e54cf86efcfe25f2e792bca599ed4 ]
      
      Icelake support the same RAPL counters as Skylake.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-11-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea6ff1bb
    • K
      perf/x86/msr: Add Icelake support · 3a9a1fd1
      Kan Liang 提交于
      [ Upstream commit cf50d79a8cfe5adae37fec026220b009559bbeed ]
      
      Icelake is the same as the existing Skylake parts.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-12-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3a9a1fd1
    • T
      x86/irq/64: Limit IST stack overflow check to #DB stack · f843f848
      Thomas Gleixner 提交于
      [ Upstream commit 7dbcf2b0b770eeb803a416ee8dcbef78e6389d40 ]
      
      Commit
      
        37fe6a42 ("x86: Check stack overflow in detail")
      
      added a broad check for the full exception stack area, i.e. it considers
      the full exception stack area as valid.
      
      That's wrong in two aspects:
      
       1) It does not check the individual areas one by one
      
       2) #DF, NMI and #MCE are not enabling interrupts which means that a
          regular device interrupt cannot happen in their context. In fact if a
          device interrupt hits one of those IST stacks that's a bug because some
          code path enabled interrupts while handling the exception.
      
      Limit the check to the #DB stack and consider all other IST stacks as
      'overflow' or invalid.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      f843f848
    • J
      x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() · f46ae1cd
      Jiri Kosina 提交于
      [ Upstream commit a65c88e16f32aa9ef2e8caa68ea5c29bd5eb0ff0 ]
      
      In-NMI warnings have been added to vmalloc_fault() via:
      
        ebc8827f ("x86: Barf when vmalloc and kmemcheck faults happen in NMI")
      
      back in the time when our NMI entry code could not cope with nested NMIs.
      
      These days, it's perfectly fine to take a fault in NMI context and we
      don't have to care about the fact that IRET from the fault handler might
      cause NMI nesting.
      
      This warning has already been removed from 32-bit implementation of
      vmalloc_fault() in:
      
        6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      
      but the 64-bit version was omitted.
      
      Remove the bogus warning also from 64-bit implementation of vmalloc_fault().
      Reported-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1904240902280.9803@cbobk.fhfr.pmSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f46ae1cd
    • K
      x86/build: Move _etext to actual end of .text · 0fcb3cd5
      Kees Cook 提交于
      [ Upstream commit 392bef709659abea614abfe53cf228e7a59876a4 ]
      
      When building x86 with Clang LTO and CFI, CFI jump regions are
      automatically added to the end of the .text section late in linking. As a
      result, the _etext position was being labelled before the appended jump
      regions, causing confusion about where the boundaries of the executable
      region actually are in the running kernel, and broke at least the fault
      injection code. This moves the _etext mark to outside (and immediately
      after) the .text area, as it already the case on other architectures
      (e.g. arm64, arm).
      Reported-and-tested-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190423183827.GA4012@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0fcb3cd5
    • N
      x86/modules: Avoid breaking W^X while loading modules · 8715ce03
      Nadav Amit 提交于
      [ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ]
      
      When modules and BPF filters are loaded, there is a time window in
      which some memory is both writable and executable. An attacker that has
      already found another vulnerability (e.g., a dangling pointer) might be
      able to exploit this behavior to overwrite kernel code. Prevent having
      writable executable PTEs in this stage.
      
      In addition, avoiding having W+X mappings can also slightly simplify the
      patching of modules code on initialization (e.g., by alternatives and
      static-key), as would be done in the next patch. This was actually the
      main motivation for this patch.
      
      To avoid having W+X mappings, set them initially as RW (NX) and after
      they are set as RO set them as X as well. Setting them as executable is
      done as a separate step to avoid one core in which the old PTE is cached
      (hence writable), and another which sees the updated PTE (executable),
      which would break the W^X protection.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8715ce03
    • S
      kvm: svm/avic: fix off-by-one in checking host APIC ID · 709a9305
      Suthikulpanit, Suravee 提交于
      commit c9bcd3e3335d0a29d89fabd2c385e1b989e6f1b0 upstream.
      
      Current logic does not allow VCPU to be loaded onto CPU with
      APIC ID 255. This should be allowed since the host physical APIC ID
      field in the AVIC Physical APIC table entry is an 8-bit value,
      and APIC ID 255 is valid in system with x2APIC enabled.
      Instead, do not allow VCPU load if the host APIC ID cannot be
      represented by an 8-bit value.
      
      Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
      instead of AVIC_MAX_PHYSICAL_ID_COUNT.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      709a9305
    • P
      KVM: x86: fix return value for reserved EFER · 432ec4fa
      Paolo Bonzini 提交于
      commit 66f61c92889ff3ca365161fb29dd36d6354682ba upstream.
      
      Commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for
      host-initiated writes", 2019-04-02) introduced a "return false" in a
      function returning int, and anyway set_efer has a "nonzero on error"
      conventon so it should be returning 1.
      Reported-by: NPavel Machek <pavel@denx.de>
      Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes")
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      432ec4fa
    • S
      x86: Hide the int3_emulate_call/jmp functions from UML · 1d84eb87
      Steven Rostedt (VMware) 提交于
      commit 693713cbdb3a4bda5a8a678c31f06560bbb14657 upstream.
      
      User Mode Linux does not have access to the ip or sp fields of the pt_regs,
      and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
      and int3_emulate_call() instructions from UML, as it doesn't need them
      anyway.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d84eb87
  2. 26 5月, 2019 6 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · a0b1dde1
      Jiri Olsa 提交于
      [ Upstream commit 6f55967ad9d9752813e36de6d5fdbd19741adfc7 ]
      
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a0b1dde1
    • G
      x86/mm/mem_encrypt: Disable all instrumentation for early SME setup · f037116f
      Gary Hook 提交于
      [ Upstream commit b51ce3744f115850166f3d6c292b9c8cb849ad4f ]
      
      Enablement of AMD's Secure Memory Encryption feature is determined very
      early after start_kernel() is entered. Part of this procedure involves
      scanning the command line for the parameter 'mem_encrypt'.
      
      To determine intended state, the function sme_enable() uses library
      functions cmdline_find_option() and strncmp(). Their use occurs early
      enough such that it cannot be assumed that any instrumentation subsystem
      is initialized.
      
      For example, making calls to a KASAN-instrumented function before KASAN
      is set up will result in the use of uninitialized memory and a boot
      failure.
      
      When AMD's SME support is enabled, conditionally disable instrumentation
      of these dependent functions in lib/string.c and arch/x86/lib/cmdline.c.
      
       [ bp: Get rid of intermediary nostackp var and cleanup whitespace. ]
      
      Fixes: aca20d54 ("x86/mm: Add support to make use of Secure Memory Encryption")
      Reported-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NGary R Hook <gary.hook@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boris Brezillon <bbrezillon@kernel.org>
      Cc: Coly Li <colyli@suse.de>
      Cc: "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: "luto@kernel.org" <luto@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "mingo@redhat.com" <mingo@redhat.com>
      Cc: "peterz@infradead.org" <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/155657657552.7116.18363762932464011367.stgit@sosrh3.amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      f037116f
    • V
      x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 · a0a49d87
      Vitaly Kuznetsov 提交于
      [ Upstream commit da66761c2d93a46270d69001abb5692717495a68 ]
      
      It was reported that with some special Multi Processor Group configuration,
      e.g:
       bcdedit.exe /set groupsize 1
       bcdedit.exe /set maxgroup on
       bcdedit.exe /set groupaware on
      for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
      is in use.
      
      Tracing kvm_hv_flush_tlb immediately reveals the issue:
      
       kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
      
      The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
      however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
      We don't flush anything and apparently it's not what Windows expects.
      
      TLFS doesn't say anything about such requests and newer Windows versions
      seem to be unaffected. This all feels like a WS2012 bug, which is, however,
      easy to workaround in KVM: let's flush everything when we see an empty
      flush request, over-flushing doesn't hurt.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a0a49d87
    • P
      ftrace/x86_64: Emulate call function while updating in breakpoint handler · 07b487eb
      Peter Zijlstra 提交于
      commit 9e298e8604088a600d8100a111a532a9d342af09 upstream.
      
      Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
      function tracer started tracing the same function that was patched, the
      conversion of the fentry call site during the translation of going from
      calling the live kernel patch trampoline to the iterator trampoline, would
      have as slight window where it didn't call anything.
      
      As live kernel patching depends on ftrace to always call its code (to
      prevent the function being traced from being called, as it will redirect
      it). This small window would allow the old buggy function to be called, and
      this can cause undesirable results.
      
      Nicolai submitted new patches[2] but these were controversial. As this is
      similar to the static call emulation issues that came up a while ago[3].
      But after some debate[4][5] adding a gap in the stack when entering the
      breakpoint handler allows for pushing the return address onto the stack to
      easily emulate a call.
      
      [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
      [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
      [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
      [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com
      [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_w@mail.gmail.com
      
      [
        Live kernel patching is not implemented on x86_32, thus the emulate
        calls are only for x86_64.
      ]
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Changed to only implement emulated calls for x86_64 ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07b487eb
    • P
      x86_64: Allow breakpoints to emulate call instructions · ba246f64
      Peter Zijlstra 提交于
      commit 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 upstream.
      
      In order to allow breakpoints to emulate call instructions, they need to push
      the return address onto the stack. The x86_64 int3 handler adds a small gap
      to allow the stack to grow some. Use this gap to add the return address to
      be able to emulate a call instruction at the breakpoint location.
      
      These helper functions are added:
      
        int3_emulate_jmp(): changes the location of the regs->ip to return there.
      
       (The next two are only for x86_64)
        int3_emulate_push(): to push the address onto the gap in the stack
        int3_emulate_call(): push the return address and change regs->ip
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Modified to only work for x86_64 and added comment to int3_emulate_push() ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba246f64
    • J
      x86_64: Add gap to int3 to allow for call emulation · 01b6fdce
      Josh Poimboeuf 提交于
      commit 2700fefdb2d9751c416ad56897e27d41e409324a upstream.
      
      To allow an int3 handler to emulate a call instruction, it must be able to
      push a return address onto the stack. Add a gap to the stack to allow the
      int3 handler to push the return address and change the return from int3 to
      jump straight to the emulated called function target.
      
      Link: http://lkml.kernel.org/r/20181130183917.hxmti5josgq4clti@treble
      Link: http://lkml.kernel.org/r/20190502162133.GX2623@hirez.programming.kicks-ass.net
      
      [
        Note, this is needed to allow Live Kernel Patching to not miss calling a
        patched function when tracing is enabled. -- Steven Rostedt
      ]
      
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01b6fdce
  3. 22 5月, 2019 6 次提交
  4. 17 5月, 2019 5 次提交
  5. 15 5月, 2019 6 次提交