1. 23 6月, 2017 1 次提交
  2. 09 6月, 2017 1 次提交
    • V
      cxl: Avoid double free_irq() for psl,slice interrupts · ed45509b
      Vaibhav Jain 提交于
      During an eeh call to cxl_remove can result in double free_irq of
      psl,slice interrupts. This can happen if perst_reloads_same_image == 1
      and call to cxl_configure_adapter() fails during slot_reset
      callback. In such a case we see a kernel oops with following back-trace:
      
      Oops: Kernel access of bad area, sig: 11 [#1]
      Call Trace:
        free_irq+0x88/0xd0 (unreliable)
        cxl_unmap_irq+0x20/0x40 [cxl]
        cxl_native_release_psl_irq+0x78/0xd8 [cxl]
        pci_deconfigure_afu+0xac/0x110 [cxl]
        cxl_remove+0x104/0x210 [cxl]
        pci_device_remove+0x6c/0x110
        device_release_driver_internal+0x204/0x2e0
        pci_stop_bus_device+0xa0/0xd0
        pci_stop_and_remove_bus_device+0x28/0x40
        pci_hp_remove_devices+0xb0/0x150
        pci_hp_remove_devices+0x68/0x150
        eeh_handle_normal_event+0x140/0x580
        eeh_handle_event+0x174/0x360
        eeh_event_handler+0x1e8/0x1f0
      
      This patch fixes the issue of double free_irq by checking that
      variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
      not '0' before un-mapping and resetting these variables to '0' when
      they are un-mapped.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed45509b
  3. 08 6月, 2017 1 次提交
  4. 06 6月, 2017 1 次提交
    • V
      cxl: Avoid double free_irq() for psl,slice interrupts · b3aa20ba
      Vaibhav Jain 提交于
      During an eeh call to cxl_remove can result in double free_irq of
      psl,slice interrupts. This can happen if perst_reloads_same_image == 1
      and call to cxl_configure_adapter() fails during slot_reset
      callback. In such a case we see a kernel oops with following back-trace:
      
      Oops: Kernel access of bad area, sig: 11 [#1]
      Call Trace:
        free_irq+0x88/0xd0 (unreliable)
        cxl_unmap_irq+0x20/0x40 [cxl]
        cxl_native_release_psl_irq+0x78/0xd8 [cxl]
        pci_deconfigure_afu+0xac/0x110 [cxl]
        cxl_remove+0x104/0x210 [cxl]
        pci_device_remove+0x6c/0x110
        device_release_driver_internal+0x204/0x2e0
        pci_stop_bus_device+0xa0/0xd0
        pci_stop_and_remove_bus_device+0x28/0x40
        pci_hp_remove_devices+0xb0/0x150
        pci_hp_remove_devices+0x68/0x150
        eeh_handle_normal_event+0x140/0x580
        eeh_handle_event+0x174/0x360
        eeh_event_handler+0x1e8/0x1f0
      
      This patch fixes the issue of double free_irq by checking that
      variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
      not '0' before un-mapping and resetting these variables to '0' when
      they are un-mapped.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b3aa20ba
  5. 02 5月, 2017 3 次提交
  6. 19 4月, 2017 1 次提交
  7. 13 4月, 2017 7 次提交
  8. 24 3月, 2017 1 次提交
  9. 20 3月, 2017 1 次提交
  10. 02 3月, 2017 4 次提交
  11. 25 2月, 2017 1 次提交
  12. 21 2月, 2017 1 次提交
    • A
      cxl: fix nested locking hang during EEH hotplug · 171ed0fc
      Andrew Donnellan 提交于
      Commit 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU
      not configured") introduced a rwsem to fix an invalid memory access that
      occurred when someone attempts to access the config space of an AFU on a
      vPHB whilst the AFU is deconfigured, such as during EEH recovery.
      
      It turns out that it's possible to run into a nested locking issue when EEH
      recovery fails and a full device hotplug is required.
      cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
      configured_rwsem. When EEH recovery fails, the EEH code calls
      pci_hp_remove_devices() to remove the device, which in turn calls
      cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
      to grab the writer lock that's already held.
      
      Standard rwsem semantics don't express what we really want to do here and
      don't allow for nested locking. Fix this by replacing the rwsem with an
      atomic_t which we can control more finely. Allow the AFU to be locked
      multiple times so long as there are no readers.
      
      Fixes: 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      171ed0fc
  13. 03 2月, 2017 1 次提交
  14. 25 1月, 2017 3 次提交
  15. 15 12月, 2016 1 次提交
  16. 25 11月, 2016 1 次提交
  17. 23 11月, 2016 1 次提交
  18. 18 11月, 2016 5 次提交
  19. 24 10月, 2016 1 次提交
  20. 19 10月, 2016 1 次提交
    • V
      cxl: Prevent adapter reset if an active context exists · 70b565bb
      Vaibhav Jain 提交于
      This patch prevents resetting the cxl adapter via sysfs in presence of
      one or more active cxl_context on it. This protects against an
      unrecoverable error caused by PSL owning a dirty cache line even after
      reset and host tries to touch the same cache line. In case a force reset
      of the card is required irrespective of any active contexts, the int
      value -1 can be stored in the 'reset' sysfs attribute of the card.
      
      The patch introduces a new atomic_t member named contexts_num inside
      struct cxl that holds the number of active context attached to the card
      , which is checked against '0' before proceeding with the reset. To
      prevent against a race condition where a context is activated just after
      reset check is performed, the contexts_num is atomically set to '-1'
      after reset-check to indicate that no more contexts can be activated on
      the card anymore.
      
      Before activating a context we atomically test if contexts_num is
      non-negative and if so, increment its value by one. In case the value of
      contexts_num is negative then it indicates that the card is about to be
      reset and context activation is error-ed out at that point.
      
      Fixes: 62fa19d4 ("cxl: Add ability to reset the card")
      Cc: stable@vger.kernel.org # v4.0+
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      70b565bb
  21. 04 10月, 2016 2 次提交
  22. 13 9月, 2016 1 次提交