1. 05 7月, 2021 1 次提交
  2. 01 5月, 2021 1 次提交
  3. 17 4月, 2021 1 次提交
    • C
      powerpc/xive: Use the "ibm, chip-id" property only under PowerNV · e9e16917
      Cédric Le Goater 提交于
      The 'chip_id' field of the XIVE CPU structure is used to choose a
      target for a source located on the same chip. For that, the XIVE
      driver queries the chip identifier from the "ibm,chip-id" property
      and compares it to a 'src_chip' field identifying the chip of a
      source. This information is only available on the PowerNV platform,
      'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries.
      
      The "ibm,chip-id" property is also not available on all platforms. It
      was first introduced on PowerNV and later, under QEMU for pSeries/KVM.
      However, the property is not part of PAPR and does not exist under
      pSeries/PowerVM.
      
      Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the
      PowerNV platform override the value with the "ibm,chip-id" property.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org
      e9e16917
  4. 14 4月, 2021 8 次提交
  5. 11 12月, 2020 8 次提交
  6. 18 9月, 2020 1 次提交
  7. 28 5月, 2020 1 次提交
  8. 26 5月, 2020 1 次提交
    • C
      powerpc/xive: Clear the page tables for the ESB IO mapping · a101950f
      Cédric Le Goater 提交于
      Commit 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the
      machine crash handler") fixed an issue in the FW assisted dump of
      machines using hash MMU and the XIVE interrupt mode under the POWER
      hypervisor. It forced the mapping of the ESB page of interrupts being
      mapped in the Linux IRQ number space to make sure the 'crash kexec'
      sequence worked during such an event. But it didn't handle the
      un-mapping.
      
      This mapping is now blocking the removal of a passthrough IO adapter
      under the POWER hypervisor because it expects the guest OS to have
      cleared all page table entries related to the adapter. If some are
      still present, the RTAS call which isolates the PCI slot returns error
      9001 "valid outstanding translations".
      
      Remove these mapping in the IRQ data cleanup routine.
      
      Under KVM, this cleanup is not required because the ESB pages for the
      adapter interrupts are un-mapped from the guest by the hypervisor in
      the KVM XIVE native device. This is now redundant but it's harmless.
      
      Fixes: 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the machine crash handler")
      Cc: stable@vger.kernel.org # v5.5+
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org
      a101950f
  9. 07 5月, 2020 1 次提交
    • C
      powerpc/xive: Enforce load-after-store ordering when StoreEOI is active · b1f9be93
      Cédric Le Goater 提交于
      When an interrupt has been handled, the OS notifies the interrupt
      controller with a EOI sequence. On a POWER9 system using the XIVE
      interrupt controller, this can be done with a load or a store
      operation on the ESB interrupt management page of the interrupt. The
      StoreEOI operation has less latency and improves interrupt handling
      performance but it was deactivated during the POWER9 DD2.0 timeframe
      because of ordering issues. We use the LoadEOI today but we plan to
      reactivate StoreEOI in future architectures.
      
      There is usually no need to enforce ordering between ESB load and
      store operations as they should lead to the same result. E.g. a store
      trigger and a load EOI can be executed in any order. Assuming the
      interrupt state is PQ=10, a store trigger followed by a load EOI will
      return a Q bit. In the reverse order, it will create a new interrupt
      trigger from HW. In both cases, the handler processing interrupts is
      notified.
      
      In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
      disable temporarily the interrupt source (mask/unmask). When the
      source is reenabled, the OS can detect if interrupts were received
      while the source was disabled and reinject them. This process needs
      special care when StoreEOI is activated. The ESB load and store
      operations should be correctly ordered because a XIVE_ESB_STORE_EOI
      operation could leave the source enabled if it has not completed
      before the loads.
      
      For those cases, we enforce Load-after-Store ordering with a special
      load operation offset. To avoid performance impact, this ordering is
      only enforced when really needed, that is when interrupt sources are
      temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
      be needed for other loads.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org
      b1f9be93
  10. 26 3月, 2020 4 次提交
  11. 22 1月, 2020 1 次提交
  12. 13 11月, 2019 1 次提交
  13. 13 9月, 2019 2 次提交
  14. 19 8月, 2019 1 次提交
  15. 16 8月, 2019 1 次提交
    • P
      powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race · da15c03b
      Paul Mackerras 提交于
      Testing has revealed the existence of a race condition where a XIVE
      interrupt being shut down can be in one of the XIVE interrupt queues
      (of which there are up to 8 per CPU, one for each priority) at the
      point where free_irq() is called.  If this happens, can return an
      interrupt number which has been shut down.  This can lead to various
      symptoms:
      
      - irq_to_desc(irq) can be NULL.  In this case, no end-of-interrupt
        function gets called, resulting in the CPU's elevated interrupt
        priority (numerically lowered CPPR) never gets reset.  That then
        means that the CPU stops processing interrupts, causing device
        timeouts and other errors in various device drivers.
      
      - The irq descriptor or related data structures can be in the process
        of being freed as the interrupt code is using them.  This typically
        leads to crashes due to bad pointer dereferences.
      
      This race is basically what commit 62e04686 ("genirq: Add optional
      hardware synchronization for shutdown", 2019-06-28) is intended to
      fix, given a get_irqchip_state() method for the interrupt controller
      being used.  It works by polling the interrupt controller when an
      interrupt is being freed until the controller says it is not pending.
      
      With XIVE, the PQ bits of the interrupt source indicate the state of
      the interrupt source, and in particular the P bit goes from 0 to 1 at
      the point where the hardware writes an entry into the interrupt queue
      that this interrupt is directed towards.  Normally, the code will then
      process the interrupt and do an end-of-interrupt (EOI) operation which
      will reset PQ to 00 (assuming another interrupt hasn't been generated
      in the meantime).  However, there are situations where the code resets
      P even though a queue entry exists (for example, by setting PQ to 01,
      which disables the interrupt source), and also situations where the
      code leaves P at 1 after removing the queue entry (for example, this
      is done for escalation interrupts so they cannot fire again until
      they are explicitly re-enabled).
      
      The code already has a 'saved_p' flag for the interrupt source which
      indicates that a queue entry exists, although it isn't maintained
      consistently.  This patch adds a 'stale_p' flag to indicate that
      P has been left at 1 after processing a queue entry, and adds code
      to set and clear saved_p and stale_p as necessary to maintain a
      consistent indication of whether a queue entry may or may not exist.
      
      With this, we can implement xive_get_irqchip_state() by looking at
      stale_p, saved_p and the ESB PQ bits for the interrupt.
      
      There is some additional code to handle escalation interrupts
      properly; because they are enabled and disabled in KVM assembly code,
      which does not have access to the xive_irq_data struct for the
      escalation interrupt.  Hence, stale_p may be incorrect when the
      escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
      Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
      with some careful attention to barriers in order to ensure the correct
      result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
      
      Finally, this adds code to make noise on the console (pr_crit and
      WARN_ON(1)) if we find an interrupt queue entry for an interrupt
      which does not have a descriptor.  While this won't catch the race
      reliably, if it does get triggered it will be an indication that
      the race is occurring and needs to be debugged.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
      da15c03b
  16. 05 8月, 2019 1 次提交
  17. 18 7月, 2019 1 次提交
    • G
      powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · 4d202c8c
      Gautham R. Shenoy 提交于
      xive_find_target_in_mask() has the following for(;;) loop which has a
      bug when @first == cpumask_first(@mask) and condition 1 fails to hold
      for every CPU in @mask. In this case we loop forever in the for-loop.
      
        first = cpu;
        for (;;) {
        	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
      		  return cpu;
      	  cpu = cpumask_next(cpu, mask);
      	  if (cpu == first) // condition 2
      		  break;
      
      	  if (cpu >= nr_cpu_ids) // condition 3
      		  cpu = cpumask_first(mask);
        }
      
      This is because, when @first == cpumask_first(@mask), we never hit the
      condition 2 (cpu == first) since prior to this check, we would have
      executed "cpu = cpumask_next(cpu, mask)" which will set the value of
      @cpu to a value greater than @first or to nr_cpus_ids. When this is
      coupled with the fact that condition 1 is not met, we will never exit
      this loop.
      
      This was discovered by the hard-lockup detector while running LTP test
      concurrently with SMT switch tests.
      
       watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
       watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
       watchdog: CPU 68 Hard LOCKUP
       watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
       CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
       NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
       REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
       MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
       CFAR: c0000000006f558c IRQMASK: 1
       GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
       GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
       GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
       GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
       GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
       GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
       GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
       GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
       NIP [c0000000006f5578] find_next_bit+0x38/0x90
       LR [c000000000cba9ec] cpumask_next+0x2c/0x50
       Call Trace:
       [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
       [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
       [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
       [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
       [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
       [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
       [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
       [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
       [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
       [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
       [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
       [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
       [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
       [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
       [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
       [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
       [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
       Instruction dump:
       78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
       4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
      
      To fix this, move the check for condition 2 after the check for
      condition 3, so that we are able to break out of the loop soon after
      iterating through all the CPUs in the @mask in the problem case. Use
      do..while() to achieve this.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
      4d202c8c
  18. 31 5月, 2019 1 次提交
  19. 15 1月, 2019 1 次提交
  20. 25 11月, 2018 1 次提交
  21. 03 10月, 2018 1 次提交
  22. 24 8月, 2018 1 次提交