1. 09 3月, 2016 12 次提交
  2. 05 1月, 2016 1 次提交
  3. 24 11月, 2015 1 次提交
    • V
      cxl: Fix possible idr warning when contexts are released · 1b5df59e
      Vaibhav Jain 提交于
      An idr warning is reported when a context is release after the capi card
      is unbound from the cxl driver via sysfs. Below are the steps to
      reproduce:
      
      1. Create multiple afu contexts in an user-space application using libcxl.
      2. Unbind capi card from cxl using command of form
         echo <capi-card-pci-addr> > /sys/bus/pci/drivers/cxl-pci/unbind
      3. Exit/kill the application owning afu contexts.
      
      After above steps a warning message is usually seen in the kernel logs
      of the form "idr_remove called for id=<context-id> which is not
      allocated."
      
      This is caused by the function cxl_release_afu which destroys the
      contexts_idr table. So when a context is release no entry for context pe
      is found in the contexts_idr table and idr code prints this warning.
      
      This patch fixes this issue by increasing & decreasing the ref-count on
      the afu device when a context is initialized or when its freed
      respectively. This prevents the afu from being released until all the
      afu contexts have been released. The patch introduces two new functions
      namely cxl_afu_get/put that manage the ref-count on the afu device.
      
      Also the patch removes code inside cxl_dev_context_init that increases ref
      on the afu device as its guaranteed to be alive during this function.
      Reported-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1b5df59e
  4. 01 10月, 2015 1 次提交
  5. 30 8月, 2015 2 次提交
  6. 18 8月, 2015 1 次提交
    • I
      cxl: Add alternate MMIO error handling · d9232a3d
      Ian Munsie 提交于
      userspace programs using cxl currently have to use two strategies for
      dealing with MMIO errors simultaneously. They have to check every read
      for a return of all Fs in case the adapter has gone away and the kernel
      has not yet noticed, and they have to deal with SIGBUS in case the
      kernel has already noticed, invalidated the mapping and marked the
      context as failed.
      
      In order to simplify things, this patch adds an alternative approach
      where the kernel will return a page filled with Fs instead of delivering
      a SIGBUS. This allows userspace to only need to deal with one of these
      two error paths, and is intended for use in libraries that use cxl
      transparently and may not be able to safely install a signal handler.
      
      This approach will only work if certain constraints are met. Namely, if
      the application is both reading and writing to an address in the problem
      state area it cannot assume that a non-FF read is OK, as it may just be
      reading out a value it has previously written. Further - since only one
      page is used per context a write to a given offset would be visible when
      reading the same offset from a different page in the mapping (this only
      applies within a single context, not between contexts).
      
      An application could deal with this by e.g. making sure it also reads
      from a read-only offset after any reads to a read/write offset.
      
      Due to these constraints, this functionality must be explicitly
      requested by userspace when starting the context by passing in the
      CXL_START_WORK_ERR_FF flag.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d9232a3d
  7. 14 8月, 2015 5 次提交
    • D
      cxl: EEH support · 9e8df8a2
      Daniel Axtens 提交于
      EEH (Enhanced Error Handling) allows a driver to recover from the
      temporary failure of an attached PCI card. Enable basic CXL support
      for EEH.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e8df8a2
    • D
      cxl: Allow the kernel to trust that an image won't change on PERST. · 13e68d8b
      Daniel Axtens 提交于
      Provide a kernel API and a sysfs entry which allow a user to specify
      that when a card is PERSTed, it's image will stay the same, allowing
      it to participate in EEH.
      
      cxl_reset is used to reflash the card. In that case, we cannot safely
      assert that the image will not change. Therefore, disallow cxl_reset
      if the flag is set.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      13e68d8b
    • D
      cxl: Allocate and release the SPA with the AFU · 05155772
      Daniel Axtens 提交于
      Previously the SPA was allocated and freed upon entering and leaving
      AFU-directed mode. This causes some issues for error recovery - contexts
      hold a pointer inside the SPA, and they may persist after the AFU has
      been detached.
      
      We would ideally like to allocate the SPA when the AFU is allocated, and
      release it until the AFU is released. However, we don't know how big the
      SPA needs to be until we read the AFU descriptor.
      
      Therefore, restructure the code:
      
       - Allocate the SPA only once, on the first attach.
      
       - Release the SPA only when the entire AFU is being released (not
         detached). Guard the release with a NULL check, so we don't free
         if it was never allocated (e.g. dedicated mode)
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05155772
    • D
      cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75
      Daniel Axtens 提交于
      If the PCI channel has gone down, don't attempt to poke the hardware.
      
      We need to guard every time cxl_whatever_(read|write) is called. This
      is because a call to those functions will dereference an offset into an
      mmio register, and the mmio mappings get invalidated in the EEH
      teardown.
      
      Check in the read/write functions in the header.
      We give them the same semantics as usual PCI operations:
       - a write to a channel that is down is ignored.
       - a read from a channel that is down returns all fs.
      
      Also, we try to access the MMIO space of a vPHB device as part of the
      PCI disable path. Because that's a read that bypasses most of our usual
      checks, we handle it explicitly.
      
      As far as user visible warnings go:
       - Check link state in file ops, return -EIO if down.
       - Be reasonably quiet if there's an error in a teardown path,
         or when we already know the hardware is going down.
       - Throw a big WARN if someone tries to start a CXL operation
         while the card is down. This gives a useful stacktrace for
         debugging whatever is doing that.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b3f9c75
    • D
      cxl: Convert MMIO read/write macros to inline functions · 588b34be
      Daniel Axtens 提交于
      We're about to make these more complex, so make them functions
      first.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      588b34be
  8. 03 6月, 2015 11 次提交
  9. 06 2月, 2015 1 次提交
    • I
      cxl: Export optional AFU configuration record in sysfs · b087e619
      Ian Munsie 提交于
      An AFU may optionally contain one or more PCIe like configuration
      records, which can be used to identify the AFU.
      
      This patch adds support for exposing the raw config space and the
      vendor, device and class code under sysfs. These will appear in a
      subdirectory of the AFU device corresponding with the configuration
      record number, e.g.
      
      cat /sys/class/cxl/afu0.0/cr0/vendor
      0x1014
      
      cat /sys/class/cxl/afu0.0/cr0/device
      0x4350
      
      cat /sys/class/cxl/afu0.0/cr0/class
      0x120000
      
      hexdump -C /sys/class/cxl/afu0.0/cr0/config
      00000000  14 10 50 43 00 00 00 00  06 00 00 12 00 00 00 00  |..PC............|
      00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00000100
      
      These files behave in much the same way as the equivalent files for PCI
      devices, with one exception being that the config file is currently
      read-only and restricted to the root user. It is not necessarily
      required to be this strict, but we currently do not have a compelling
      use-case to make it writable and/or world-readable, so I erred on the
      side of being restrictive.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b087e619
  10. 22 1月, 2015 2 次提交
  11. 29 12月, 2014 1 次提交
    • I
      cxl: Disable AFU debug flag · d6a6af2c
      Ian Munsie 提交于
      Upon inspection of the implementation specific registers, it was
      discovered that the high bit of the implementation specific RXCTL
      register was enabled, which enables the DEADB00F debug feature.
      
      The debug feature causes MMIO reads to a disabled AFU to respond with
      0xDEADB00F instead of all Fs. In general this should not be visible as
      the kernel will only allow MMIO access to enabled AFUs, but there may be
      some circumstances where an AFU may become disabled while it is use.
      One such case would be an AFU designed to only be used in the dedicated
      process mode and to disable itself after it has completed it's work
      (however even in that case the effects of this debug flag would be
      limited as the userspace application must have completed any required
      MMIO accesses before the AFU disables itself with or without the flag).
      
      This patch removes the debug flag and replaces the magic value
      programmed into this register with a preprocessor define so it is
      clearer what the rest of this initialisation does.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6a6af2c
  12. 12 12月, 2014 2 次提交
    • I
      cxl: Unmap MMIO regions when detaching a context · b123429e
      Ian Munsie 提交于
      If we need to force detach a context (e.g. due to EEH or simply force
      unbinding the driver) we should prevent the userspace contexts from
      being able to access the Problem State Area MMIO region further, which
      they may have mapped with mmap().
      
      This patch unmaps any mapped MMIO regions when detaching a userspace
      context.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b123429e
    • I
      cxl: Change contexts_lock to a mutex to fix sleep while atomic bug · ee41d11d
      Ian Munsie 提交于
      We had a known sleep while atomic bug if a CXL device was forcefully
      unbound while it was in use. This could occur as a result of EEH, or
      manually induced with something like this while the device was in use:
      
      echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind
      
      The issue was that in this code path we iterated over each context and
      forcefully detached it with the contexts_lock spin lock held, however
      the detach also needed to take the spu_mutex, and call schedule.
      
      This patch changes the contexts_lock to a mutex so that we are not in
      atomic context while doing the detach, thereby avoiding the sleep while
      atomic.
      
      Also delete the related TODO comment, which suggested an alternate
      solution which turned out to not be workable.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ee41d11d