1. 05 10月, 2017 1 次提交
  2. 29 8月, 2017 1 次提交
  3. 25 7月, 2017 1 次提交
    • Y
      x86/mce/AMD: Allow any CPU to initialize the smca_banks array · 9662d43f
      Yazen Ghannam 提交于
      Current SMCA implementations have the same banks on each CPU with the
      non-core banks only visible to a "master thread" on each die. Practically,
      this means the smca_banks array, which describes the banks, only needs to
      be populated once by a single master thread.
      
      CPU 0 seemed like a good candidate to do the populating. However, it's
      possible that CPU 0 is not enabled in which case the smca_banks array won't
      be populated.
      
      Rather than try to figure out another master thread to do the populating,
      we should just allow any CPU to populate the array.
      
      Drop the CPU 0 check and return early if the bank was already initialized.
      Also, drop the WARNing about an already initialized bank, since this will
      be a common, expected occurrence.
      
      The smca_banks array is only populated at boot time and CPUs are brought
      online sequentially. So there's no need for locking around the array.
      
      If the first CPU up is a master thread, then it will populate the array
      with all banks, core and non-core. Every CPU afterwards will return
      early. If the first CPU up is not a master thread, then it will populate
      the array with all core banks. The first CPU afterwards that is a master
      thread will skip populating the core banks and continue populating the
      non-core banks.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NJack Miller <jack@codezen.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170724101228.17326-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9662d43f
  4. 14 6月, 2017 2 次提交
    • Y
      x86/mce/AMD: Use saved threshold block info in interrupt handler · 17ef4af0
      Yazen Ghannam 提交于
      In the amd_threshold_interrupt() handler, we loop through every possible
      block in each bank and rediscover the block's address and if it's valid,
      e.g. valid, counter present and not locked.
      
      However, we already have the address saved in the threshold blocks list
      for each CPU and bank. The list only contains blocks that have passed
      all the valid checks.
      
      Besides the redundancy, there's also a smp_call_function* in
      get_block_address() which causes a warning when servicing the interrupt:
      
       WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0
       ...
       Call Trace:
        <IRQ>
        rdmsr_safe_on_cpu()
        get_block_address.isra.2()
        amd_threshold_interrupt()
        smp_threshold_interrupt()
        threshold_interrupt()
      
      because we do get called in an interrupt handler *with* interrupts
      disabled, which can result in a deadlock.
      
      Drop the redundant valid checks and move the overflow check, logging and
      block reset into a separate function.
      
      Check the first block then iterate over the rest. This procedure is
      needed since the first block is used as the head of the list.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17ef4af0
    • Y
      x86/mce/AMD: Use msr_stat when clearing MCA_STATUS · a24b8c34
      Yazen Ghannam 提交于
      The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS.
      
      This may cause the following warning:
      
       unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000)
       Call Trace:
        <IRQ>
        smp_threshold_interrupt()
        threshold_interrupt()
      
      Use msr_stat instead which has the MSR address.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Fixes: 37d43acf ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
      Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a24b8c34
  5. 22 5月, 2017 3 次提交
  6. 31 3月, 2017 1 次提交
  7. 24 1月, 2017 2 次提交
  8. 05 1月, 2017 1 次提交
    • D
      x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers · c4158ff5
      Daniel Bristot de Oliveira 提交于
      This patch adds the __irq_entry annotation to the default x86
      platform IRQ handlers. ftrace's function_graph tracer uses the
      __irq_entry annotation to notify the entry and return of IRQ
      handlers.
      
      For example, before the patch:
        354549.667252 |   3)  d..1              |  default_idle_call() {
        354549.667252 |   3)  d..1              |    arch_cpu_idle() {
        354549.667253 |   3)  d..1              |      default_idle() {
        354549.696886 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
        354549.696886 |   3)  d..1              |          irq_enter() {
        354549.696886 |   3)  d..1              |            rcu_irq_enter() {
      
      After the patch:
        366416.254476 |   3)  d..1              |    arch_cpu_idle() {
        366416.254476 |   3)  d..1              |      default_idle() {
        366416.261566 |   3)  d..1  ==========> |
        366416.261566 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
        366416.261566 |   3)  d..1              |          irq_enter() {
        366416.261566 |   3)  d..1              |            rcu_irq_enter() {
      
      KASAN also uses this annotation. The smp_apic_timer_interrupt()
      was already annotated.
      Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Claudio Fontana <claudio.fontana@huawei.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: linux-edac@vger.kernel.org
      Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c4158ff5
  9. 27 12月, 2016 1 次提交
  10. 10 12月, 2016 1 次提交
  11. 22 11月, 2016 1 次提交
    • Y
      x86/mce/AMD: Add system physical address translation for AMD Fam17h · f5382de9
      Yazen Ghannam 提交于
      The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
      in their MCA_ADDR registers. We need to convert that normalized address
      to a system physical address in order to support a few facilities:
      
      1) To offline poisoned pages in DRAM proactively in the deferred error
         handler.
      
      2) To print sysaddr and page info for DRAM ECC errors in EDAC.
      
      [ Boris: fixes/cleanups ontop:
      
        * hi_addr_offset = 0 - no need for that branch. Stick it all under the
          HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.
      
        * Move variables to the innermost scope they're used at so that we save
          on stack and not blow it up immediately on function entry.
      
        * Do not modify *sys_addr prematurely - we want to not exit early and
          have modified *sys_addr some, which callers get to see. We either
          convert to a sys_addr or we don't do anything. And we signal that with
          the retval of the function.
      
        * Rename label out -> out_err - because it is the error path.
      
        * No need to pr_err of the conversion failed case: imagine a
          sparsely-populated machine with UMCs which don't have DIMMs. Callers
          should look at the retval instead and issue a printk only when really
          necessary. No need for useless info in dmesg.
      
        * s/temp_reg/tmp/ and other variable names shortening => shorter code.
      
        * Use BIT() everywhere.
      
        * Make error messages more informative.
      
        *  Small build fix for the !CONFIG_X86_MCE_AMD case.
      
        * ... and more minor cleanups.
      ]
      Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
      [ Typo fixes. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f5382de9
  12. 21 11月, 2016 1 次提交
    • B
      x86/MCE/AMD: Fix thinko about thresholding_en · 254fe9c7
      Borislav Petkov 提交于
      So adding thresholding_en et al was a good thing for removing the
      per-CPU thresholding callback, i.e., threshold_cpu_callback.
      
      But, in order for it to work and especially that test in
      mce_threshold_create_device() so that all thresholding banks get
      properly created and not the whole thing to fail with a NULL ptr
      dereference at mce_cpu_pre_down() when we offline the CPUs, we need to
      set the thresholding_en flag *before* we start creating the devices.
      
      Yap, it failed because thresholding_en wasn't set at the time
      we were creating the banks so we didn't create any and then at
      mce_cpu_pre_down() -> mce_threshold_remove_device() time, we would blow
      up.
      
      And the fix is actually easy: we have thresholding on the system when we
      have managed to set the thresholding vector to amd_threshold_interrupt()
      earlier in mce_amd_feature_init() while we were picking apart the
      thresholding banks and what is set and what not.
      
      So let's do that.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Fixes: 4d7b02d5 ("x86/mcheck: Split threshold_cpu_callback into two callbacks")
      Link: http://lkml.kernel.org/r/20161119103402.5227-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      254fe9c7
  13. 16 11月, 2016 5 次提交
  14. 09 11月, 2016 4 次提交
  15. 13 9月, 2016 8 次提交
  16. 08 7月, 2016 1 次提交
  17. 12 5月, 2016 3 次提交
  18. 03 5月, 2016 1 次提交
  19. 08 3月, 2016 2 次提交