1. 31 5月, 2019 7 次提交
    • Y
      x86/mce: Handle varying MCA bank counts · 75a96196
      Yazen Ghannam 提交于
      [ Upstream commit 006c077041dc73b9490fffc4c6af5befe0687110 ]
      
      Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a
      CPU. Currently, this number is the same for all CPUs and a warning is
      shown if there is a difference. The number of banks is overwritten with
      the MCG_CAP[Count] value of each following CPU that boots.
      
      According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives
      the number of banks that are available to a "processor implementation".
      The AMD BKDGs/PPRs further clarify that this value is per core. This
      value has historically been the same for every core in the system, but
      that is not an architectural requirement.
      
      Future AMD systems may have different MCG_CAP[Count] values per core,
      so the assumption that all CPUs will have the same MCG_CAP[Count] value
      will no longer be valid.
      
      Also, the first CPU to boot will allocate the struct mce_banks[] array
      using the number of banks based on its MCG_CAP[Count] value. The machine
      check handler and other functions use the global number of banks to
      iterate and index into the mce_banks[] array. So it's possible to use an
      out-of-bounds index on an asymmetric system where a following CPU sees a
      MCG_CAP[Count] value greater than its predecessors.
      
      Thus, allocate the mce_banks[] array to the maximum number of banks.
      This will avoid the potential out-of-bounds index since the value of
      mca_cfg.banks is capped to MAX_NR_BANKS.
      
      Set the value of mca_cfg.banks equal to the max of the previous value
      and the value for the current CPU. This way mca_cfg.banks will always
      represent the max number of banks detected on any CPU in the system.
      
      This will ensure that all CPUs will access all the banks that are
      visible to them. A CPU that can access fewer than the max number of
      banks will find the registers of the extra banks to be read-as-zero.
      
      Furthermore, print the resulting number of MCA banks in use. Do this in
      mcheck_late_init() so that the final value is printed after all CPUs
      have been initialized.
      
      Finally, get bank count from target CPU when doing injection with mce-inject
      module.
      
       [ bp: Remove out-of-bounds example, passify and cleanup commit message. ]
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      75a96196
    • T
      x86/mce: Fix machine_check_poll() tests for error types · 3d036cba
      Tony Luck 提交于
      [ Upstream commit f19501aa07f18268ab14f458b51c1c6b7f72a134 ]
      
      There has been a lurking "TBD" in the machine check poll routine ever
      since it was first split out from the machine check handler. The
      potential issue is that the poll routine may have just begun a read from
      the STATUS register in a machine check bank when the hardware logs an
      error in that bank and signals a machine check.
      
      That race used to be pretty small back when machine checks were
      broadcast, but the addition of local machine check means that the poll
      code could continue running and clear the error from the bank before the
      local machine check handler on another CPU gets around to reading it.
      
      Fix the code to be sure to only process errors that need to be processed
      in the poll code, leaving other logged errors alone for the machine
      check handler to find and process.
      
       [ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ]
      
      Fixes: b79109c3 ("x86, mce: separate correct machine check poller and fatal exception handler")
      Fixes: ed7290d0 ("x86, mce: implement new status bits")
      Reported-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-deskSigned-off-by: NSasha Levin <sashal@kernel.org>
      3d036cba
    • P
      x86/uaccess, signal: Fix AC=1 bloat · 4614b0bb
      Peter Zijlstra 提交于
      [ Upstream commit 88e4718275c1bddca6f61f300688b4553dc8584b ]
      
      Occasionally GCC is less agressive with inlining and the following is
      observed:
      
        arch/x86/kernel/signal.o: warning: objtool: restore_sigcontext()+0x3cc: call to force_valid_ss.isra.5() with UACCESS enabled
        arch/x86/kernel/signal.o: warning: objtool: do_signal()+0x384: call to frame_uc_flags.isra.0() with UACCESS enabled
      
      Cure this by moving this code out of the AC=1 region, since it really
      isn't needed for the user access.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4614b0bb
    • B
      x86/microcode: Fix the ancient deprecated microcode loading method · a07de9b9
      Borislav Petkov 提交于
      [ Upstream commit 24613a04ad1c0588c10f4b5403ca60a73d164051 ]
      
      Commit
      
        2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      
      added the new define UCODE_NEW to denote that an update should happen
      only when newer microcode (than installed on the system) has been found.
      
      But it missed adjusting that for the old /dev/cpu/microcode loading
      interface. Fix it.
      
      Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jann Horn <jannh@google.com>
      Link: https://lkml.kernel.org/r/20190405133010.24249-3-bp@alien8.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      a07de9b9
    • T
      x86/irq/64: Limit IST stack overflow check to #DB stack · f843f848
      Thomas Gleixner 提交于
      [ Upstream commit 7dbcf2b0b770eeb803a416ee8dcbef78e6389d40 ]
      
      Commit
      
        37fe6a42 ("x86: Check stack overflow in detail")
      
      added a broad check for the full exception stack area, i.e. it considers
      the full exception stack area as valid.
      
      That's wrong in two aspects:
      
       1) It does not check the individual areas one by one
      
       2) #DF, NMI and #MCE are not enabling interrupts which means that a
          regular device interrupt cannot happen in their context. In fact if a
          device interrupt hits one of those IST stacks that's a bug because some
          code path enabled interrupts while handling the exception.
      
      Limit the check to the #DB stack and consider all other IST stacks as
      'overflow' or invalid.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      f843f848
    • K
      x86/build: Move _etext to actual end of .text · 0fcb3cd5
      Kees Cook 提交于
      [ Upstream commit 392bef709659abea614abfe53cf228e7a59876a4 ]
      
      When building x86 with Clang LTO and CFI, CFI jump regions are
      automatically added to the end of the .text section late in linking. As a
      result, the _etext position was being labelled before the appended jump
      regions, causing confusion about where the boundaries of the executable
      region actually are in the running kernel, and broke at least the fault
      injection code. This moves the _etext mark to outside (and immediately
      after) the .text area, as it already the case on other architectures
      (e.g. arm64, arm).
      Reported-and-tested-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190423183827.GA4012@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0fcb3cd5
    • N
      x86/modules: Avoid breaking W^X while loading modules · 8715ce03
      Nadav Amit 提交于
      [ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ]
      
      When modules and BPF filters are loaded, there is a time window in
      which some memory is both writable and executable. An attacker that has
      already found another vulnerability (e.g., a dangling pointer) might be
      able to exploit this behavior to overwrite kernel code. Prevent having
      writable executable PTEs in this stage.
      
      In addition, avoiding having W+X mappings can also slightly simplify the
      patching of modules code on initialization (e.g., by alternatives and
      static-key), as would be done in the next patch. This was actually the
      main motivation for this patch.
      
      To avoid having W+X mappings, set them initially as RW (NX) and after
      they are set as RO set them as X as well. Setting them as executable is
      done as a separate step to avoid one core in which the old PTE is cached
      (hence writable), and another which sees the updated PTE (executable),
      which would break the W^X protection.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8715ce03
  2. 26 5月, 2019 1 次提交
  3. 22 5月, 2019 2 次提交
  4. 17 5月, 2019 3 次提交
  5. 15 5月, 2019 18 次提交
  6. 08 5月, 2019 1 次提交
  7. 02 5月, 2019 1 次提交
    • S
      x86/fpu: Don't export __kernel_fpu_{begin,end}() · d4ff57d0
      Sebastian Andrzej Siewior 提交于
      commit 12209993e98c5fa1855c467f22a24e3d5b8be205 upstream.
      
      There is one user of __kernel_fpu_begin() and before invoking it,
      it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
      right away. The 32bit version of arch_efi_call_virt_setup() and
      arch_efi_call_virt_teardown() does this already.
      
      The comment above *kernel_fpu*() claims that before invoking
      __kernel_fpu_begin() preemption should be disabled and that KVM is a
      good example of doing it. Well, KVM doesn't do that since commit
      
        f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
      
      so it is not an example anymore.
      
      With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
      be made static and not exported anymore.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: linux-efi <linux-efi@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4ff57d0
  8. 27 4月, 2019 3 次提交
  9. 20 4月, 2019 4 次提交