1. 02 5月, 2017 2 次提交
    • V
      cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 4f58f0bf
      Vaibhav Jain 提交于
      Fix a boundary condition where in some cases an eeh event that results
      in card reset isn't passed on to a driver attached to the virtual PCI
      device associated with a slice. This will happen in case when a slice
      attached device driver returns a value other than
      PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
      would result in an early return from cxl_pci_error_detected() and
      other drivers attached to other AFUs on the card wont be notified.
      
      The patch fixes this by making sure that all slice attached
      device-drivers are notified and the return values from
      error_detected() callback are aggregated in a scheme where request for
      'disconnect' trumps all and 'none' trumps 'need_reset'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4f58f0bf
    • V
      cxl: Force context lock during EEH flow · ea9a26d1
      Vaibhav Jain 提交于
      During an eeh event when the cxl card is fenced and card sysfs attr
      perst_reloads_same_image is set following warning message is seen in the
      kernel logs:
      
        Adapter context unlocked with 0 active contexts
        ------------[ cut here ]------------
        WARNING: CPU: 12 PID: 627 at
        ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
      
      Even though this warning is harmless, it clutters the kernel log
      during an eeh event. This warning is triggered as the EEH callback
      cxl_pci_error_detected doesn't obtain a context-lock before forcibly
      detaching all active context and when context-lock is released during
      call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
      cxl_adapter_context_unlock is triggered.
      
      To fix this warning, we acquire the adapter context-lock via
      cxl_adapter_context_lock() in the eeh callback
      cxl_pci_error_detected() once all the virtual AFU PHBs are notified
      and their contexts detached. The context-lock is released in
      cxl_pci_slot_reset() after the adapter is successfully reconfigured
      and before the we call the slot_reset callback on slice attached
      device-drivers.
      
      Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Tested-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea9a26d1
  2. 19 4月, 2017 1 次提交
  3. 13 4月, 2017 5 次提交
  4. 21 2月, 2017 1 次提交
    • A
      cxl: fix nested locking hang during EEH hotplug · 171ed0fc
      Andrew Donnellan 提交于
      Commit 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU
      not configured") introduced a rwsem to fix an invalid memory access that
      occurred when someone attempts to access the config space of an AFU on a
      vPHB whilst the AFU is deconfigured, such as during EEH recovery.
      
      It turns out that it's possible to run into a nested locking issue when EEH
      recovery fails and a full device hotplug is required.
      cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
      configured_rwsem. When EEH recovery fails, the EEH code calls
      pci_hp_remove_devices() to remove the device, which in turn calls
      cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
      to grab the writer lock that's already held.
      
      Standard rwsem semantics don't express what we really want to do here and
      don't allow for nested locking. Fix this by replacing the rwsem with an
      atomic_t which we can control more finely. Allow the AFU to be locked
      multiple times so long as there are no readers.
      
      Fixes: 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      171ed0fc
  5. 25 1月, 2017 2 次提交
  6. 18 11月, 2016 1 次提交
  7. 19 10月, 2016 1 次提交
    • V
      cxl: Prevent adapter reset if an active context exists · 70b565bb
      Vaibhav Jain 提交于
      This patch prevents resetting the cxl adapter via sysfs in presence of
      one or more active cxl_context on it. This protects against an
      unrecoverable error caused by PSL owning a dirty cache line even after
      reset and host tries to touch the same cache line. In case a force reset
      of the card is required irrespective of any active contexts, the int
      value -1 can be stored in the 'reset' sysfs attribute of the card.
      
      The patch introduces a new atomic_t member named contexts_num inside
      struct cxl that holds the number of active context attached to the card
      , which is checked against '0' before proceeding with the reset. To
      prevent against a race condition where a context is activated just after
      reset check is performed, the contexts_num is atomically set to '-1'
      after reset-check to indicate that no more contexts can be activated on
      the card anymore.
      
      Before activating a context we atomically test if contexts_num is
      non-negative and if so, increment its value by one. In case the value of
      contexts_num is negative then it indicates that the card is about to be
      reset and context activation is error-ed out at that point.
      
      Fixes: 62fa19d4 ("cxl: Add ability to reset the card")
      Cc: stable@vger.kernel.org # v4.0+
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      70b565bb
  8. 04 10月, 2016 1 次提交
  9. 13 9月, 2016 1 次提交
  10. 10 8月, 2016 1 次提交
  11. 09 8月, 2016 1 次提交
  12. 14 7月, 2016 6 次提交
    • A
      cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards · b0b5e591
      Andrew Donnellan 提交于
      Add a new API, cxl_check_and_switch_mode() to allow for switching of
      bi-modal CAPI cards, such as the Mellanox CX-4 network card.
      
      When a driver requests to switch a card to CAPI mode, use PCI hotplug
      infrastructure to remove all PCI devices underneath the slot. We then write
      an updated mode control register to the CAPI VSEC, hot reset the card, and
      reprobe the card.
      
      As the card may present a different set of PCI devices after the mode
      switch, use the infrastructure provided by the pnv_php driver and the OPAL
      PCI slot management facilities to ensure that:
      
        * the old devices are removed from both the OPAL and Linux device trees
        * the new devices are probed by OPAL and added to the OPAL device tree
        * the new devices are added to the Linux device tree and probed through
          the regular PCI device probe path
      
      As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on
      the pnv_php driver.
      
      Refactor existing code that touches the mode control register in the
      regular single mode case into a new function, setup_cxl_protocol_area().
      Co-authored-by: NIan Munsie <imunsie@au1.ibm.com>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b0b5e591
    • I
      cxl: Workaround PE=0 hardware limitation in Mellanox CX4 · f67a6722
      Ian Munsie 提交于
      The CX4 card cannot cope with a context with PE=0 due to a hardware
      limitation, resulting in:
      
      [   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
      [   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting
      
      Since the kernel API allocates a default context very early during
      device init that will almost certainly get Process Element ID 0 there is
      no easy way for us to extend the API to allow the Mellanox to inform us
      of this limitation ahead of time.
      
      Instead, work around the issue by extending the XSL structure to include
      a minimum PE to allocate. Although the bug is not in the XSL, it is the
      easiest place to work around this limitation given that the CX4 is
      currently the only card that uses an XSL.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f67a6722
    • I
      cxl: Add support for using the kernel API with a real PHB · 317f5ef1
      Ian Munsie 提交于
      This hooks up support for using the kernel API with a real PHB. After
      the AFU initialisation has completed it calls into the PHB code to pass
      it the AFU that will be used by other peer physical functions on the
      adapter.
      
      The cxl_pci_to_afu API is extended to work with peer PCI devices,
      retrieving the peer AFU from the PHB. This API may also now return an
      error if it is called on a PCI device that is not associated with either
      a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
      down.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      317f5ef1
    • I
      cxl: Do not create vPHB if there are no AFU configuration records · e4f5fc00
      Ian Munsie 提交于
      The vPHB model of the cxl kernel API is a hierarchy where the AFU is
      represented by the vPHB, and it's AFU configuration records are exposed
      as functions under that vPHB. If there are no AFU configuration records
      we will create a vPHB with nothing under it, which is a waste of
      resources and will opt us into EEH handling despite not having anything
      special to handle.
      
      This also does not make sense for cards using the peer model of the cxl
      kernel API, where the other functions of the device are exposed via
      additional peer physical functions rather than AFU configuration
      records. This model will also not work with the existing EEH handling in
      the cxl driver, as that is designed around the vPHB model.
      
      Skip creating the vPHB for AFUs without any AFU configuration records,
      and opt out of EEH handling for them.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e4f5fc00
    • I
      cxl: Enable bus mastering for devices using CAPP DMA mode · 48b3adf3
      Ian Munsie 提交于
      Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
      master to be enabled in order for the CAPI traffic to flow. This should
      be harmless to enable for other cxl devices, so unconditionally enable
      it in the adapter init flow.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      48b3adf3
    • I
      cxl: Add cxl_slot_is_supported API · 4e56f858
      Ian Munsie 提交于
      This extends the check that the adapter is in a CAPI capable slot so
      that it may be called by external users in the kernel API. This will be
      used by the upcoming Mellanox CX4 support, which needs to know ahead of
      time if the card can be switched to cxl mode so that it can leave it in
      PCI mode if it is not.
      
      This API takes a parameter to check if CAPP DMA mode is supported, which
      it currently only allows on P8NVL systems, since that mode currently has
      issues accessing memory < 4GB on P8, and we cannot realistically avoid
      that.
      
      This API does not currently check if a CAPP unit is available (i.e. not
      already assigned to another PHB) on P8. Doing so would be racy since it
      is assigned on a first come first serve basis, and so long as CAPP DMA
      mode is not supported on P8 we don't need this, since the only
      anticipated user of this API requires CAPP DMA mode.
      
      Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e56f858
  13. 08 7月, 2016 3 次提交
    • P
      cxl: Ignore CAPI adapters misplaced in switched slots · 3b3dcd61
      Philippe Bergheaud 提交于
      One should not attempt to switch a PHB into CAPI mode if there is
      a switch between the PHB and the adapter. This patch modifies the
      cxl driver to ignore CAPI adapters misplaced in switched slots.
      Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3b3dcd61
    • I
      cxl: Fix bug where AFU disable operation had no effect · 5e7823c9
      Ian Munsie 提交于
      The AFU disable operation has a bug where it will not clear the enable
      bit and therefore will have no effect. To date this has likely been
      masked by fact that we perform an AFU reset before the disable, which
      also has the effect of clearing the enable bit, making the following
      disable operation effectively a noop on most hardware. This patch
      modifies the afu_control function to take a parameter to clear from the
      AFU control register so that the disable operation can clear the
      appropriate bit.
      
      This bug was uncovered on the Mellanox CX4, which uses an XSL rather
      than a PSL. On the XSL the reset operation will not complete while the
      AFU is enabled, meaning the enable bit was still set at the start of the
      disable and as a result this bug was hit and the disable also timed out.
      
      Because of this difference in behaviour between the PSL and XSL, this
      patch now makes the reset dependent on the card using a PSL to avoid
      waiting for a timeout on the XSL. It is entirely possible that we may be
      able to drop the reset altogether if it turns out we only ever needed it
      due to this bug - however I am not willing to drop it without further
      regression testing and have added comments to the code explaining the
      background.
      
      This also fixes a small issue where the AFU_Cntl register was read
      outside of the lock that protects it.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e7823c9
    • I
      cxl: Fix allowing bogus AFU descriptors with 0 maximum processes · 49e9c99f
      Ian Munsie 提交于
      If the AFU descriptor of an AFU directed AFU indicates that it supports
      0 maximum processes, we will accept that value and attempt to use it.
      The SPA will still be allocated (with 2 pages due to another minor bug
      and room for 958 processes), and when a context is allocated we will
      pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will
      treat that as meaning no maximum and will allocate a context number and
      we return a valid context.
      
      Conceivably, this could lead to a buffer overflow of the SPA if more
      than 958 contexts were allocated, however this is mitigated by the fact
      that there are no known AFUs in the wild with a bogus AFU descriptor
      like this, and that only the root user is allowed to flash an AFU image
      to a card.
      
      Add a check when validating the AFU descriptor to reject any with 0
      maximum processes.
      
      We do still allow a dedicated process only AFU to indicate that it
      supports 0 contexts even though that is forbidden in the architecture,
      as in that case we ignore the value and use 1 instead. This is just on
      the off-chance that such a dedicated process AFU may exist (not that I
      am aware of any), since their developers are less likely to have cared
      about this value at all.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      49e9c99f
  14. 16 6月, 2016 2 次提交
    • I
      cxl: Add support for CAPP DMA mode · b385c9e9
      Ian Munsie 提交于
      This adds support for using CAPP DMA mode, which is required for XSL
      based cards such as the Mellanox CX4 to function.
      
      This is currently an RFC as it depends on the corresponding support to
      be merged into skiboot first, which was submitted here:
      http://patchwork.ozlabs.org/patch/625582/
      
      In the event that the skiboot on the system does not have the above
      support, it will indicate as such in the kernel log and abort the init
      process.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b385c9e9
    • F
      cxl: Abstract the differences between the PSL and XSL · 6d382616
      Frederic Barrat 提交于
      The XSL (Translation Service Layer) is a stripped down version of the
      PSL (Power Service Layer) used in some cards such as the Mellanox CX4.
      
      Like the PSL, it implements the CAIA architecture, but has a number of
      differences, mostly in it's implementation dependent registers. This
      adds an ops structure to abstract these differences to bring initial
      support for XSL CAPI devices.
      
      The XSL does not implement the optional architected SERR register,
      however while it treats it as a reserved register and should work with
      no special treatment, attempting to access it will cause the XSL_FEC
      (First Error Capture) register to be filled out, preventing it from
      capturing any subsequent errors. Therefore, this patch also prevents the
      kernel from trying to set up the SERR register so that the FEC register
      may still be useful, and to save one interrupt.
      
      The XSL also uses a special DMA cxl mode, which uses a slightly
      different init sequence for the CAPP and PHB. The kernel support for
      this will be in a future patch once the corresponding support has been
      merged into skiboot.
      Co-authored-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6d382616
  15. 22 4月, 2016 2 次提交
  16. 11 4月, 2016 1 次提交
  17. 09 3月, 2016 7 次提交
  18. 29 2月, 2016 1 次提交
  19. 06 2月, 2016 1 次提交
    • B
      PCI: Remove includes of asm/pci-bridge.h · 952bbcb0
      Bjorn Helgaas 提交于
      Drivers should include asm/pci-bridge.h only when they need the arch-
      specific things provided there.  Outside of the arch/ directories, the only
      drivers that actually need things provided by asm/pci-bridge.h are the
      powerpc RPA hotplug drivers in drivers/pci/hotplug/rpa*.
      
      Remove the includes of asm/pci-bridge.h from the other drivers, adding an
      include of linux/pci.h if necessary.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      952bbcb0