1. 11 5月, 2016 1 次提交
  2. 22 4月, 2016 1 次提交
  3. 09 3月, 2016 14 次提交
  4. 05 1月, 2016 1 次提交
  5. 24 11月, 2015 1 次提交
    • V
      cxl: Fix possible idr warning when contexts are released · 1b5df59e
      Vaibhav Jain 提交于
      An idr warning is reported when a context is release after the capi card
      is unbound from the cxl driver via sysfs. Below are the steps to
      reproduce:
      
      1. Create multiple afu contexts in an user-space application using libcxl.
      2. Unbind capi card from cxl using command of form
         echo <capi-card-pci-addr> > /sys/bus/pci/drivers/cxl-pci/unbind
      3. Exit/kill the application owning afu contexts.
      
      After above steps a warning message is usually seen in the kernel logs
      of the form "idr_remove called for id=<context-id> which is not
      allocated."
      
      This is caused by the function cxl_release_afu which destroys the
      contexts_idr table. So when a context is release no entry for context pe
      is found in the contexts_idr table and idr code prints this warning.
      
      This patch fixes this issue by increasing & decreasing the ref-count on
      the afu device when a context is initialized or when its freed
      respectively. This prevents the afu from being released until all the
      afu contexts have been released. The patch introduces two new functions
      namely cxl_afu_get/put that manage the ref-count on the afu device.
      
      Also the patch removes code inside cxl_dev_context_init that increases ref
      on the afu device as its guaranteed to be alive during this function.
      Reported-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1b5df59e
  6. 01 10月, 2015 1 次提交
  7. 30 8月, 2015 2 次提交
  8. 18 8月, 2015 1 次提交
    • I
      cxl: Add alternate MMIO error handling · d9232a3d
      Ian Munsie 提交于
      userspace programs using cxl currently have to use two strategies for
      dealing with MMIO errors simultaneously. They have to check every read
      for a return of all Fs in case the adapter has gone away and the kernel
      has not yet noticed, and they have to deal with SIGBUS in case the
      kernel has already noticed, invalidated the mapping and marked the
      context as failed.
      
      In order to simplify things, this patch adds an alternative approach
      where the kernel will return a page filled with Fs instead of delivering
      a SIGBUS. This allows userspace to only need to deal with one of these
      two error paths, and is intended for use in libraries that use cxl
      transparently and may not be able to safely install a signal handler.
      
      This approach will only work if certain constraints are met. Namely, if
      the application is both reading and writing to an address in the problem
      state area it cannot assume that a non-FF read is OK, as it may just be
      reading out a value it has previously written. Further - since only one
      page is used per context a write to a given offset would be visible when
      reading the same offset from a different page in the mapping (this only
      applies within a single context, not between contexts).
      
      An application could deal with this by e.g. making sure it also reads
      from a read-only offset after any reads to a read/write offset.
      
      Due to these constraints, this functionality must be explicitly
      requested by userspace when starting the context by passing in the
      CXL_START_WORK_ERR_FF flag.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d9232a3d
  9. 14 8月, 2015 5 次提交
    • D
      cxl: EEH support · 9e8df8a2
      Daniel Axtens 提交于
      EEH (Enhanced Error Handling) allows a driver to recover from the
      temporary failure of an attached PCI card. Enable basic CXL support
      for EEH.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e8df8a2
    • D
      cxl: Allow the kernel to trust that an image won't change on PERST. · 13e68d8b
      Daniel Axtens 提交于
      Provide a kernel API and a sysfs entry which allow a user to specify
      that when a card is PERSTed, it's image will stay the same, allowing
      it to participate in EEH.
      
      cxl_reset is used to reflash the card. In that case, we cannot safely
      assert that the image will not change. Therefore, disallow cxl_reset
      if the flag is set.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      13e68d8b
    • D
      cxl: Allocate and release the SPA with the AFU · 05155772
      Daniel Axtens 提交于
      Previously the SPA was allocated and freed upon entering and leaving
      AFU-directed mode. This causes some issues for error recovery - contexts
      hold a pointer inside the SPA, and they may persist after the AFU has
      been detached.
      
      We would ideally like to allocate the SPA when the AFU is allocated, and
      release it until the AFU is released. However, we don't know how big the
      SPA needs to be until we read the AFU descriptor.
      
      Therefore, restructure the code:
      
       - Allocate the SPA only once, on the first attach.
      
       - Release the SPA only when the entire AFU is being released (not
         detached). Guard the release with a NULL check, so we don't free
         if it was never allocated (e.g. dedicated mode)
      Acked-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05155772
    • D
      cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75
      Daniel Axtens 提交于
      If the PCI channel has gone down, don't attempt to poke the hardware.
      
      We need to guard every time cxl_whatever_(read|write) is called. This
      is because a call to those functions will dereference an offset into an
      mmio register, and the mmio mappings get invalidated in the EEH
      teardown.
      
      Check in the read/write functions in the header.
      We give them the same semantics as usual PCI operations:
       - a write to a channel that is down is ignored.
       - a read from a channel that is down returns all fs.
      
      Also, we try to access the MMIO space of a vPHB device as part of the
      PCI disable path. Because that's a read that bypasses most of our usual
      checks, we handle it explicitly.
      
      As far as user visible warnings go:
       - Check link state in file ops, return -EIO if down.
       - Be reasonably quiet if there's an error in a teardown path,
         or when we already know the hardware is going down.
       - Throw a big WARN if someone tries to start a CXL operation
         while the card is down. This gives a useful stacktrace for
         debugging whatever is doing that.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b3f9c75
    • D
      cxl: Convert MMIO read/write macros to inline functions · 588b34be
      Daniel Axtens 提交于
      We're about to make these more complex, so make them functions
      first.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      588b34be
  10. 03 6月, 2015 11 次提交
  11. 06 2月, 2015 1 次提交
    • I
      cxl: Export optional AFU configuration record in sysfs · b087e619
      Ian Munsie 提交于
      An AFU may optionally contain one or more PCIe like configuration
      records, which can be used to identify the AFU.
      
      This patch adds support for exposing the raw config space and the
      vendor, device and class code under sysfs. These will appear in a
      subdirectory of the AFU device corresponding with the configuration
      record number, e.g.
      
      cat /sys/class/cxl/afu0.0/cr0/vendor
      0x1014
      
      cat /sys/class/cxl/afu0.0/cr0/device
      0x4350
      
      cat /sys/class/cxl/afu0.0/cr0/class
      0x120000
      
      hexdump -C /sys/class/cxl/afu0.0/cr0/config
      00000000  14 10 50 43 00 00 00 00  06 00 00 12 00 00 00 00  |..PC............|
      00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00000100
      
      These files behave in much the same way as the equivalent files for PCI
      devices, with one exception being that the config file is currently
      read-only and restricted to the root user. It is not necessarily
      required to be this strict, but we currently do not have a compelling
      use-case to make it writable and/or world-readable, so I erred on the
      side of being restrictive.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b087e619
  12. 22 1月, 2015 1 次提交
    • R
      cxl: Add ability to reset the card · 62fa19d4
      Ryan Grimm 提交于
      Adds reset to sysfs which will PERST the card. If load_image_on_perst is set
      to "user" or "factory", the PERST will cause that image to be loaded.
      
      load_image_on_perst is set to "user" for production.
      
      "none" could be used for debugging. The PSL trace arrays are preserved which
      then can be read through debugfs.
      
      PERST also triggers CAPP recovery. An HMI comes in, which is handled by EEH.
      EEH unbinds the driver, calls into Sapphire to reinitialize the PHB, then
      rebinds the driver.
      Signed-off-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      62fa19d4