1. 17 8月, 2015 1 次提交
  2. 14 8月, 2015 10 次提交
    • D
      cxl: EEH support · 9e8df8a2
      Daniel Axtens 提交于
      EEH (Enhanced Error Handling) allows a driver to recover from the
      temporary failure of an attached PCI card. Enable basic CXL support
      for EEH.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e8df8a2
    • D
      cxl: Allow the kernel to trust that an image won't change on PERST. · 13e68d8b
      Daniel Axtens 提交于
      Provide a kernel API and a sysfs entry which allow a user to specify
      that when a card is PERSTed, it's image will stay the same, allowing
      it to participate in EEH.
      
      cxl_reset is used to reflash the card. In that case, we cannot safely
      assert that the image will not change. Therefore, disallow cxl_reset
      if the flag is set.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      13e68d8b
    • D
      cxl: Don't remove AFUs/vPHBs in cxl_reset · 4e1efb40
      Daniel Axtens 提交于
      If the driver doesn't participate in EEH, the AFUs will be removed
      by cxl_remove, which will be invoked by EEH.
      
      If the driver does particpate in EEH, the vPHB needs to stick around
      so that the it can particpate.
      
      In both cases, we shouldn't remove the AFU/vPHB.
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e1efb40
    • D
      cxl: Refactor AFU init/teardown · d76427b0
      Daniel Axtens 提交于
      As with an adapter, some aspects of initialisation are done only once
      in the lifetime of an AFU: for example, allocating memory, or setting
      up sysfs/debugfs files.
      
      However, we may want to be able to do some parts of the initialisation
      multiple times: for example, in error recovery we want to be able to
      tear down and then re-map IO memory and IRQs.
      
      Therefore, refactor AFU init/teardown as follows.
      
       - Create two new functions: 'cxl_configure_afu', and its pair
         'cxl_deconfigure_afu'. As with the adapter functions,
         these (de)configure resources that do not need to last the entire
         lifetime of the AFU.
      
       - Allocating and releasing memory remain the task of 'cxl_alloc_afu'
         and 'cxl_release_afu'.
      
       - Once-only functions that do not involve allocating/releasing memory
         stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
         However, the task of picking an AFU mode and activating it has been
         broken out.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d76427b0
    • D
      cxl: Refactor adaptor init/teardown · c044c415
      Daniel Axtens 提交于
      Some aspects of initialisation are done only once in the lifetime of
      an adapter: for example, allocating memory for the adapter,
      allocating the adapter number, or setting up sysfs/debugfs files.
      
      However, we may want to be able to do some parts of the
      initialisation multiple times: for example, in error recovery we
      want to be able to tear down and then re-map IO memory and IRQs.
      
      Therefore, refactor CXL init/teardown as follows.
      
       - Keep the overarching functions 'cxl_init_adapter' and its pair,
         'cxl_remove_adapter'.
      
       - Move all 'once only' allocation/freeing steps to the existing
         'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
         (This involves moving allocation of the adapter number out of
         cxl_init_adapter.)
      
       - Create two new functions: 'cxl_configure_adapter', and its pair
         'cxl_deconfigure_adapter'. These two functions 'wire up' the
         hardware --- they (de)configure resources that do not need to
         last the entire lifetime of the adapter
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c044c415
    • D
      cxl: Clean up adapter MMIO unmap path. · 575e6986
      Daniel Axtens 提交于
      - MMIO pointer unmapping is guarded by a null pointer check.
         However, iounmap doesn't null the pointer, just invalidate it.
         Therefore, explicitly null the pointer after unmapping.
      
       - afu_desc_mmio also needs to be unmapped.
      
       - PCI regions are allocated in cxl_map_adapter_regs.
         Therefore they should be released in unmap, not elsewhere.
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      575e6986
    • D
      cxl: Make IRQ release idempotent · e640d2fc
      Daniel Axtens 提交于
      Check if an IRQ is mapped before releasing it.
      
      This will simplify future EEH code by allowing unconditional unmapping
      of IRQs.
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e640d2fc
    • D
      cxl: Allocate and release the SPA with the AFU · 05155772
      Daniel Axtens 提交于
      Previously the SPA was allocated and freed upon entering and leaving
      AFU-directed mode. This causes some issues for error recovery - contexts
      hold a pointer inside the SPA, and they may persist after the AFU has
      been detached.
      
      We would ideally like to allocate the SPA when the AFU is allocated, and
      release it until the AFU is released. However, we don't know how big the
      SPA needs to be until we read the AFU descriptor.
      
      Therefore, restructure the code:
      
       - Allocate the SPA only once, on the first attach.
      
       - Release the SPA only when the entire AFU is being released (not
         detached). Guard the release with a NULL check, so we don't free
         if it was never allocated (e.g. dedicated mode)
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05155772
    • D
      cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75
      Daniel Axtens 提交于
      If the PCI channel has gone down, don't attempt to poke the hardware.
      
      We need to guard every time cxl_whatever_(read|write) is called. This
      is because a call to those functions will dereference an offset into an
      mmio register, and the mmio mappings get invalidated in the EEH
      teardown.
      
      Check in the read/write functions in the header.
      We give them the same semantics as usual PCI operations:
       - a write to a channel that is down is ignored.
       - a read from a channel that is down returns all fs.
      
      Also, we try to access the MMIO space of a vPHB device as part of the
      PCI disable path. Because that's a read that bypasses most of our usual
      checks, we handle it explicitly.
      
      As far as user visible warnings go:
       - Check link state in file ops, return -EIO if down.
       - Be reasonably quiet if there's an error in a teardown path,
         or when we already know the hardware is going down.
       - Throw a big WARN if someone tries to start a CXL operation
         while the card is down. This gives a useful stacktrace for
         debugging whatever is doing that.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b3f9c75
    • D
      cxl: Convert MMIO read/write macros to inline functions · 588b34be
      Daniel Axtens 提交于
      We're about to make these more complex, so make them functions
      first.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      588b34be
  3. 12 8月, 2015 2 次提交
  4. 11 8月, 2015 1 次提交
  5. 06 8月, 2015 1 次提交
  6. 16 7月, 2015 2 次提交
  7. 13 7月, 2015 2 次提交
  8. 10 7月, 2015 1 次提交
    • D
      cxl: Check if afu is not null in cxl_slbia · 2c069a11
      Daniel Axtens 提交于
      The pointer to an AFU in the adapter's list of AFUs can be null
      if we're in the process of removing AFUs. The afu_list_lock
      doesn't guard against this.
      
      Say we have 2 slices, and we're in the process of removing cxl.
       - We remove the AFUs in order (see cxl_remove). In cxl_remove_afu
         for AFU 0, we take the lock, set adapter->afu[0] = NULL, and
         release the lock.
       - Then we get an slbia. In cxl_slbia we take the lock, and set
         afu = adapter->afu[0], which is NULL.
       - Therefore our attempt to check afu->enabled will blow up.
      
      Therefore, check if afu is a null pointer before dereferencing it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2c069a11
  9. 08 7月, 2015 2 次提交
    • I
      cxl: Fix off by one error allowing subsequent mmap page to be accessed · 10a5894f
      Ian Munsie 提交于
      It was discovered that if a process mmaped their problem state area they
      were able to access one page more than expected, potentially allowing
      them to access the problem state area of an unrelated process.
      
      This was due to a simple off by one error in the mmap fault handler
      introduced in 0712dc7e ("cxl: Fix issues
      when unmapping contexts"), which is fixed in this patch.
      
      Cc: stable@vger.kernel.org
      Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10a5894f
    • I
      cxl: Fail mmap if requested mapping is larger than assigned problem state area · 5caaf534
      Ian Munsie 提交于
      This patch makes the mmap call fail outright if the requested region is
      larger than the problem state area assigned to the context so the error
      is reported immediately rather than waiting for an attempt to access an
      address out of bounds.
      
      Although we never expect users to map more than the assigned problem
      state area and are not aware of anyone doing this (other than for
      testing), this does have the potential to break users if someone has
      used a larger range regardless. I'm submitting it for consideration, but
      if this change is not considered acceptable the previous patch is
      sufficient to prevent access out of bounds without breaking anyone.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5caaf534
  10. 07 7月, 2015 1 次提交
  11. 06 7月, 2015 2 次提交
  12. 19 6月, 2015 2 次提交
  13. 07 6月, 2015 1 次提交
  14. 03 6月, 2015 12 次提交