1. 25 11月, 2016 1 次提交
  2. 23 11月, 2016 1 次提交
  3. 18 11月, 2016 1 次提交
  4. 04 10月, 2016 1 次提交
  5. 09 8月, 2016 1 次提交
  6. 08 7月, 2016 4 次提交
    • P
      cxl: Refine slice error debug messages · 6e0c50f9
      Philippe Bergheaud 提交于
      The PSL Slice Error Register (PSL_SERR_An) reports implementation
      dependent AFU errors, in the form of a bitmap. The PSL_SERR_An
      register content is printed in the form of hex dump debug message.
      
      This patch decodes the PSL_ERR_An register contents, and prints a
      specific error message for each possible error bit. It also dumps
      the secondary registers AFU_ERR_An and PSL_DSISR_An, that may
      contain extra debug information.
      
      This patch also removes the large WARN message that used to report
      the cxl slice error interrupt, and replaces it by a short informative
      message, that draws attention to AFU implementation errors.
      Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6e0c50f9
    • I
      cxl: Workaround XSL bug that does not clear the RA bit after a reset · 2a4f667a
      Ian Munsie 提交于
      An issue was noted in our debug logs where the XSL would leave the RA
      bit asserted after an AFU reset operation, which would effectively
      prevent further AFU reset operations from working.
      
      Workaround the issue by clearing the RA bit with an MMIO write if it is
      still asserted after any AFU control operation.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a4f667a
    • I
      cxl: Fix bug where AFU disable operation had no effect · 5e7823c9
      Ian Munsie 提交于
      The AFU disable operation has a bug where it will not clear the enable
      bit and therefore will have no effect. To date this has likely been
      masked by fact that we perform an AFU reset before the disable, which
      also has the effect of clearing the enable bit, making the following
      disable operation effectively a noop on most hardware. This patch
      modifies the afu_control function to take a parameter to clear from the
      AFU control register so that the disable operation can clear the
      appropriate bit.
      
      This bug was uncovered on the Mellanox CX4, which uses an XSL rather
      than a PSL. On the XSL the reset operation will not complete while the
      AFU is enabled, meaning the enable bit was still set at the start of the
      disable and as a result this bug was hit and the disable also timed out.
      
      Because of this difference in behaviour between the PSL and XSL, this
      patch now makes the reset dependent on the card using a PSL to avoid
      waiting for a timeout on the XSL. It is entirely possible that we may be
      able to drop the reset altogether if it turns out we only ever needed it
      due to this bug - however I am not willing to drop it without further
      regression testing and have added comments to the code explaining the
      background.
      
      This also fixes a small issue where the AFU_Cntl register was read
      outside of the lock that protects it.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e7823c9
    • I
      cxl: Fix allocating a minimum of 2 pages for the SPA · 2224b671
      Ian Munsie 提交于
      The Scheduled Process Area is allocated dynamically with enough pages to
      fit at least as many processes as the AFU descriptor indicated. Since
      the calculation is non-trivial, it does this by calculating how many
      processes could fit in an allocation of a given order, and increasing
      that order until it can fit enough processes or hits the maximum
      supported size.
      
      Currently, it will start this search using a SPA of 2 pages instead of
      1. This can waste a page of memory if the AFU's maximum number of
      supported processes was small enough to fit in one page.
      
      Fix the algorithm to start the search at 1 page.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2224b671
  7. 16 6月, 2016 2 次提交
    • F
      cxl: Abstract the differences between the PSL and XSL · 6d382616
      Frederic Barrat 提交于
      The XSL (Translation Service Layer) is a stripped down version of the
      PSL (Power Service Layer) used in some cards such as the Mellanox CX4.
      
      Like the PSL, it implements the CAIA architecture, but has a number of
      differences, mostly in it's implementation dependent registers. This
      adds an ops structure to abstract these differences to bring initial
      support for XSL CAPI devices.
      
      The XSL does not implement the optional architected SERR register,
      however while it treats it as a reserved register and should work with
      no special treatment, attempting to access it will cause the XSL_FEC
      (First Error Capture) register to be filled out, preventing it from
      capturing any subsequent errors. Therefore, this patch also prevents the
      kernel from trying to set up the SERR register so that the FEC register
      may still be useful, and to save one interrupt.
      
      The XSL also uses a special DMA cxl mode, which uses a slightly
      different init sequence for the CAPP and PHB. The kernel support for
      this will be in a future patch once the corresponding support has been
      merged into skiboot.
      Co-authored-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6d382616
    • I
      cxl: Update process element after allocating interrupts · 292841b0
      Ian Munsie 提交于
      In the kernel API, it is possible to attempt to allocate AFU interrupts
      after already starting a context. Since the process element structure
      used by the hardware is only filled out at the time the context is
      started, it will not be updated with the interrupt numbers that have
      just been allocated and therefore AFU interrupts will not work unless
      they were allocated prior to starting the context.
      
      This can present some difficulties as each CAPI enabled PCI device in
      the kernel API has a default context, which may need to be started very
      early to enable translations, potentially before interrupts can easily
      be set up.
      
      This patch makes the API more flexible to allow interrupts to be
      allocated after a context has already been started and takes care of
      updating the PE structure used by the hardware and notifying it to
      discard any cached copy it may have.
      
      The update is currently performed via a terminate/remove/add sequence.
      This is necessary on some hardware such as the XSL that does not
      properly support the update LLCMD.
      
      Note that this is only supported on powernv at present - attempting to
      perform this ordering on PowerVM will raise a warning.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      292841b0
  8. 11 5月, 2016 3 次提交
    • I
      cxl: Add kernel API to allow a context to operate with relocate disabled · 7a0d85d3
      Ian Munsie 提交于
      cxl devices typically access memory using an MMU in much the same way as
      the CPU, and each context includes a state register much like the MSR in
      the CPU. Like the CPU, the state register includes a bit to enable
      relocation, which we currently always enable.
      
      In some cases, it may be desirable to allow a device to access memory
      using real addresses instead of effective addresses, so this adds a new
      API, cxl_set_translation_mode, that can be used to disable relocation
      on a given kernel context. This can allow for the creation of a special
      privileged context that the device can use if it needs relocation
      disabled, and can use regular contexts at times when it needs relocation
      enabled.
      
      This interface is only available to users of the kernel API for obvious
      reasons, and will never be supported in a virtualised environment.
      
      This will be used by the upcoming cxl support in the mlx5 driver.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7a0d85d3
    • I
      cxl: Ensure PSL interrupt is configured for contexts with no AFU IRQs · 3c206fa7
      Ian Munsie 提交于
      In the cxl kernel API, it is possible to create a context and start it
      without allocating any interrupts. Since we assign or allocate the PSL
      interrupt when allocating AFU interrupts this will lead to a situation
      where we start the context with no means to take any faults.
      
      The user API is not affected as it always goes through the cxl interrupt
      allocation code paths and will have the PSL interrupt allocated or
      assigned, even if no AFU interrupts were requested.
      
      This checks that at least one interrupt is configured at the time of
      attach, and if not it will assign the multiplexed PSL interrupt for
      powernv, or allocate a single interrupt for PowerVM.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3c206fa7
    • I
      cxl: Handle num_of_processes larger than can fit in the SPA · 895a7980
      Ian Munsie 提交于
      num_of_process is a 16 bit field, theoretically allowing an AFU to
      support 16K processes, however the scheduled process area currently has
      a maximum size of 1MB, which limits the maximum number of processes to
      7704.
      
      Some AFUs may not necessarily care what the limit is and just want to be
      able to use the maximum by setting the field to 16K. To allow these to
      work, detect this situation and use the maximum size for the SPA.
      
      Downgrade the WARN_ON to a dev_warn.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      895a7980
  9. 27 4月, 2016 1 次提交
  10. 09 3月, 2016 9 次提交
  11. 08 12月, 2015 1 次提交
  12. 07 10月, 2015 1 次提交
    • C
      cxl: Fix number of allocated pages in SPA · 4108efb0
      Christophe Lombard 提交于
      The scheduled process area is currently allocated before assigning the
      correct maximum processes to the AFU, which will mean we only ever
      allocate a fixed number of pages for the scheduled process area. This
      will limit us to 958 processes with 2 x 64K pages. If we try to use more
      processes than that we'd probably overrun the buffer and corrupt memory
      or crash.
      
      AFUs that require three or more interrupts per process will not be
      affected as they are already limited to less processes than that, but we
      could hit it on an AFU that requires 0, 1 or 2 interrupts per process,
      or when using 4K pages.
      
      This patch moves the initialisation of the num_procs to before the SPA
      allocation so that enough pages will be allocated for the number of
      processes that the AFU supports.
      Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Cc: <stable@vger.kernel.org> # 3.18+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4108efb0
  13. 14 8月, 2015 2 次提交
    • D
      cxl: Allocate and release the SPA with the AFU · 05155772
      Daniel Axtens 提交于
      Previously the SPA was allocated and freed upon entering and leaving
      AFU-directed mode. This causes some issues for error recovery - contexts
      hold a pointer inside the SPA, and they may persist after the AFU has
      been detached.
      
      We would ideally like to allocate the SPA when the AFU is allocated, and
      release it until the AFU is released. However, we don't know how big the
      SPA needs to be until we read the AFU descriptor.
      
      Therefore, restructure the code:
      
       - Allocate the SPA only once, on the first attach.
      
       - Release the SPA only when the entire AFU is being released (not
         detached). Guard the release with a NULL check, so we don't free
         if it was never allocated (e.g. dedicated mode)
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05155772
    • D
      cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75
      Daniel Axtens 提交于
      If the PCI channel has gone down, don't attempt to poke the hardware.
      
      We need to guard every time cxl_whatever_(read|write) is called. This
      is because a call to those functions will dereference an offset into an
      mmio register, and the mmio mappings get invalidated in the EEH
      teardown.
      
      Check in the read/write functions in the header.
      We give them the same semantics as usual PCI operations:
       - a write to a channel that is down is ignored.
       - a read from a channel that is down returns all fs.
      
      Also, we try to access the MMIO space of a vPHB device as part of the
      PCI disable path. Because that's a read that bypasses most of our usual
      checks, we handle it explicitly.
      
      As far as user visible warnings go:
       - Check link state in file ops, return -EIO if down.
       - Be reasonably quiet if there's an error in a teardown path,
         or when we already know the hardware is going down.
       - Throw a big WARN if someone tries to start a CXL operation
         while the card is down. This gives a useful stacktrace for
         debugging whatever is doing that.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b3f9c75
  14. 06 8月, 2015 1 次提交
  15. 13 7月, 2015 1 次提交
  16. 03 6月, 2015 4 次提交
  17. 22 1月, 2015 1 次提交
    • I
      cxl: Add tracepoints · 9bcf28cd
      Ian Munsie 提交于
      This patch adds tracepoints throughout the cxl driver, which can provide
      insight into:
      
      - Context lifetimes
      - Commands sent to the PSL and AFU and their completion status
      - Segment and page table misses and their resolution
      - PSL and AFU interrupts
      - slbia calls from the powerpc copro_fault code
      
      These tracepoints are mostly intended to aid in debugging (particularly
      for new AFU designs), and may be useful standalone or in conjunction
      with hardware traces collected by the PSL (read out via the trace
      interface in debugfs) and AFUs.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9bcf28cd
  18. 29 12月, 2014 1 次提交
    • I
      cxl: Disable SPAP register when freeing SPA · db7933f3
      Ian Munsie 提交于
      When we deactivate the AFU directed mode we free the scheduled process
      area, but did not clear the register in the hardware that has a pointer
      to it.
      
      This should be fine since we will have already cleared out every context
      and we won't do anything that would cause the hardware to access it
      until after we have allocated a new one, but just to be safe this patch
      clears out the register when we free the page.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      db7933f3
  19. 12 12月, 2014 2 次提交
    • I
      cxl: Add timeout to process element commands · a98e6e9f
      Ian Munsie 提交于
      In the event that something goes wrong in the hardware and it is unable
      to complete a process element comment we would end up polling forever,
      effectively making the associated process unkillable.
      
      This patch adds a timeout to the process element command code path, so
      that we will give up if the hardware does not respond in a reasonable
      time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a98e6e9f
    • I
      cxl: Change contexts_lock to a mutex to fix sleep while atomic bug · ee41d11d
      Ian Munsie 提交于
      We had a known sleep while atomic bug if a CXL device was forcefully
      unbound while it was in use. This could occur as a result of EEH, or
      manually induced with something like this while the device was in use:
      
      echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind
      
      The issue was that in this code path we iterated over each context and
      forcefully detached it with the contexts_lock spin lock held, however
      the detach also needed to take the spu_mutex, and call schedule.
      
      This patch changes the contexts_lock to a mutex so that we are not in
      atomic context while doing the detach, thereby avoiding the sleep while
      atomic.
      
      Also delete the related TODO comment, which suggested an alternate
      solution which turned out to not be workable.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ee41d11d
  20. 27 11月, 2014 1 次提交
    • I
      CXL: Return error to PSL if IRQ demultiplexing fails & print clearer warning · 27bbcef2
      Ian Munsie 提交于
      If an AFU has a hardware bug that causes it to acknowledge a context
      terminate or remove while that context has outstanding transactions, it
      is possible for the kernel to receive an interrupt for that context
      after we have removed it from the context list.
      
      The kernel will not be able to demultiplex the interrupt (or worse - if
      we have already reallocated the process handle we could mis-attribute it
      to the new context), and printed a big scary warning.
      
      It did not acknowledge the interrupt, which would effectively halt
      further translation fault processing on the PSL.
      
      This patch makes the warning clearer about the likely cause of the issue
      (i.e. hardware bug) to make it obvious to future AFU designers of what
      needs to be fixed. It also prints out the process handle which can then
      be matched up with hardware and software traces for debugging.
      
      It also acknowledges the interrupt to the PSL with either an address
      error or acknowledge, so that the PSL can continue with other
      translations.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27bbcef2
  21. 18 11月, 2014 1 次提交
    • I
      cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning · bc78b05b
      Ian Munsie 提交于
      If an AFU has a hardware bug that causes it to acknowledge a context
      terminate or remove while that context has outstanding transactions, it
      is possible for the kernel to receive an interrupt for that context
      after we have removed it from the context list.
      
      The kernel will not be able to demultiplex the interrupt (or worse - if
      we have already reallocated the process handle we could mis-attribute it
      to the new context), and printed a big scary warning.
      
      It did not acknowledge the interrupt, which would effectively halt
      further translation fault processing on the PSL.
      
      This patch makes the warning clearer about the likely cause of the issue
      (i.e. hardware bug) to make it obvious to future AFU designers of what
      needs to be fixed. It also prints out the process handle which can then
      be matched up with hardware and software traces for debugging.
      
      It also acknowledges the interrupt to the PSL with either an address
      error or acknowledge, so that the PSL can continue with other
      translations.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc78b05b