1. 12 5月, 2016 1 次提交
  2. 03 5月, 2016 6 次提交
  3. 13 4月, 2016 1 次提交
  4. 18 2月, 2016 2 次提交
  5. 01 2月, 2016 1 次提交
  6. 19 12月, 2015 1 次提交
    • A
      x86/mce: Ensure offline CPUs don't participate in rendezvous process · d90167a9
      Ashok Raj 提交于
      Intel's MCA implementation broadcasts MCEs to all CPUs on the
      node. This poses a problem for offlined CPUs which cannot
      participate in the rendezvous process:
      
        Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
        Kernel Offset: disabled
        Rebooting in 100 seconds..
      
      More specifically, Linux does a soft offline of a CPU when
      writing a 0 to /sys/devices/system/cpu/cpuX/online, which
      doesn't prevent the #MC exception from being broadcasted to that
      CPU.
      
      Ensure that offline CPUs don't participate in the MCE rendezvous
      and clear the RIP valid status bit so that a second MCE won't
      cause a shutdown.
      
      Without the patch, mce_start() will increment mce_callin and
      wait for all CPUs. Offlined CPUs should avoid participating in
      the rendezvous process altogether.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      [ Massage commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d90167a9
  7. 24 11月, 2015 4 次提交
  8. 01 11月, 2015 2 次提交
  9. 28 9月, 2015 1 次提交
    • A
      x86/mce: Don't clear shared banks on Intel when offlining CPUs · 6e06780a
      Ashok Raj 提交于
      It is not safe to clear global MCi_CTL banks during CPU offline
      or suspend/resume operations. These MSRs are either
      thread-scoped (meaning private to a thread), or core-scoped
      (private to threads in that core only), or with a socket scope:
      visible and controllable from all threads in the socket.
      
      When we offline a single CPU, clearing those MCi_CTL bits will
      stop signaling for all the shared, i.e., socket-wide resources,
      such as LLC, iMC, etc.
      
      In addition, it might be possible to compromise the integrity of
      an Intel Secure Guard eXtentions (SGX) system if the attacker
      has control of the host system and is able to inject errors
      which would be otherwise ignored when MCi_CTL bits are cleared.
      
      Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets
      disabled.
      Tested-by: NSerge Ayoun <serge.ayoun@intel.com>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      [ Cleanup text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e06780a
  10. 13 8月, 2015 8 次提交
  11. 23 7月, 2015 1 次提交
  12. 07 7月, 2015 1 次提交
    • A
      x86/entry: Remove exception_enter() from most trap handlers · 8c84014f
      Andy Lutomirski 提交于
      On 64-bit kernels, we don't need it any more: we handle context
      tracking directly on entry from user mode and exit to user mode.
      
      On 32-bit kernels, we don't support context tracking at all, so
      these callbacks had no effect.
      
      Note: this doesn't change do_page_fault().  Before we do that,
      we need to make sure that there is no code that can page fault
      from kernel mode with CONTEXT_USER.  The 32-bit fast system call
      stack argument code is the only offender I'm aware of right now.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/ae22f4dfebd799c916574089964592be218151f9.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c84014f
  13. 06 7月, 2015 2 次提交
  14. 07 6月, 2015 2 次提交
  15. 28 5月, 2015 2 次提交
  16. 27 5月, 2015 1 次提交
  17. 18 5月, 2015 1 次提交
    • B
      x86/mce: Fix MCE severity messages · 17fea54b
      Borislav Petkov 提交于
      Derek noticed that a critical MCE gets reported with the wrong
      error type description:
      
        [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
        [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
        [Hardware Error]: TSC 49587b8e321cb
        [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
        [Hardware Error]: Some CPUs didn't answer in synchronization
        [Hardware Error]: Machine check: Invalid
      				   ^^^^^^^
      
      The last line with 'Invalid' should have printed the high level
      MCE error type description we get from mce_severity, i.e.
      something like:
      
        [Hardware Error]: Machine check: Action required: data load error in a user process
      
      this happens due to the fact that mce_no_way_out() iterates over
      all MCA banks and possibly overwrites the @msg argument which is
      used in the panic printing later.
      
      Change behavior to take the message of only and the (last)
      critical MCE it detects.
      Reported-by: NDerek <denc716@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17fea54b
  18. 07 5月, 2015 1 次提交
    • A
      x86/mce: Add support for deferred errors on AMD · 7559e13f
      Aravind Gopalakrishnan 提交于
      Deferred errors indicate error conditions that were not corrected, but
      those errors have not been consumed yet. They require no action from
      S/W (or action is optional). These errors provide info about a latent
      uncorrectable MCE that can occur when a poisoned data is consumed by the
      processor.
      
      Newer AMD processors can generate deferred errors and can be configured
      to generate APIC interrupts on such events.
      
      SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
      It indicates support for data poisoning in HW and deferred error
      interrupts.
      
      Add new bitfield to mce_vendor_flags for this. We use this to verify
      presence of deferred error interrupts before we enable them in mce_amd.c
      
      While at it, clarify comments in mce_vendor_flags to provide an
      indication of usages of the bitfields.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1430913538-1415-4-git-send-email-Aravind.Gopalakrishnan@amd.com
      [ beef up commit message, do CPUID(8000_0007) only once. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      7559e13f
  19. 24 3月, 2015 2 次提交