1. 24 1月, 2017 3 次提交
  2. 23 11月, 2016 1 次提交
  3. 16 11月, 2016 5 次提交
  4. 11 11月, 2016 1 次提交
    • B
      x86/MCE: Correct TSC timestamping of error records · 54467353
      Borislav Petkov 提交于
      We did have logic in the MCE code which would TSC-timestamp an error
      record only when it is exact - i.e., when it wasn't detected by polling.
      This isn't the case anymore. So let's fix that:
      
      We have a valid TSC timestamp in the error record only when it has been
      a precise detection, i.e., either in the #MC handler or in one of the
      interrupt handlers (thresholding, deferred, ...).
      
      All other error records still have mce.time which contains the wall
      time in order to be able to place the error record in time at least
      approximately.
      
      Also, this fixes another bug where machine_check_poll() would clear
      mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
      logging.
      
      The proper fix would be to generate timestamp only when it has been
      requested and not always. But that would require a more thorough code
      audit of all mce_gather_info/mce_setup() users. Add a FIXME for now.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony <tony.luck@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: lkp@01.org
      Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54467353
  5. 09 11月, 2016 1 次提交
  6. 18 9月, 2016 1 次提交
    • T
      mce, workqueue: remove keventd_up() usage · a2c2727d
      Tejun Heo 提交于
      Now that workqueue can handle work item queueing from very early
      during boot, there is no need to gate schedule_work() with
      keventd_up().  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: linux-edac@vger.kernel.org
      a2c2727d
  7. 13 9月, 2016 4 次提交
  8. 05 9月, 2016 2 次提交
  9. 08 7月, 2016 1 次提交
    • B
      x86/mce: Fix mce_rdmsrl() warning message · 38c54ccb
      Borislav Petkov 提交于
      The MSR address we're dumping in there should be in hex, otherwise we
      get funsies like:
      
        [    0.016000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:428 mce_rdmsrl+0xd9/0xe0
        [    0.016000] mce: Unable to read msr -1073733631!
      				       ^^^^^^^^^^^
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1467968983-4874-5-git-send-email-bp@alien8.de
      [ Fixed capitalization of 'MSR'. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      38c54ccb
  10. 07 7月, 2016 1 次提交
  11. 12 5月, 2016 1 次提交
  12. 03 5月, 2016 6 次提交
  13. 13 4月, 2016 1 次提交
  14. 18 2月, 2016 2 次提交
  15. 01 2月, 2016 1 次提交
  16. 19 12月, 2015 1 次提交
    • A
      x86/mce: Ensure offline CPUs don't participate in rendezvous process · d90167a9
      Ashok Raj 提交于
      Intel's MCA implementation broadcasts MCEs to all CPUs on the
      node. This poses a problem for offlined CPUs which cannot
      participate in the rendezvous process:
      
        Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
        Kernel Offset: disabled
        Rebooting in 100 seconds..
      
      More specifically, Linux does a soft offline of a CPU when
      writing a 0 to /sys/devices/system/cpu/cpuX/online, which
      doesn't prevent the #MC exception from being broadcasted to that
      CPU.
      
      Ensure that offline CPUs don't participate in the MCE rendezvous
      and clear the RIP valid status bit so that a second MCE won't
      cause a shutdown.
      
      Without the patch, mce_start() will increment mce_callin and
      wait for all CPUs. Offlined CPUs should avoid participating in
      the rendezvous process altogether.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      [ Massage commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d90167a9
  17. 24 11月, 2015 4 次提交
  18. 01 11月, 2015 2 次提交
  19. 28 9月, 2015 1 次提交
    • A
      x86/mce: Don't clear shared banks on Intel when offlining CPUs · 6e06780a
      Ashok Raj 提交于
      It is not safe to clear global MCi_CTL banks during CPU offline
      or suspend/resume operations. These MSRs are either
      thread-scoped (meaning private to a thread), or core-scoped
      (private to threads in that core only), or with a socket scope:
      visible and controllable from all threads in the socket.
      
      When we offline a single CPU, clearing those MCi_CTL bits will
      stop signaling for all the shared, i.e., socket-wide resources,
      such as LLC, iMC, etc.
      
      In addition, it might be possible to compromise the integrity of
      an Intel Secure Guard eXtentions (SGX) system if the attacker
      has control of the host system and is able to inject errors
      which would be otherwise ignored when MCi_CTL bits are cleared.
      
      Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets
      disabled.
      Tested-by: NSerge Ayoun <serge.ayoun@intel.com>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      [ Cleanup text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e06780a
  20. 13 8月, 2015 1 次提交