1. 24 6月, 2016 1 次提交
  2. 27 5月, 2016 1 次提交
    • C
      IB/core: Make device counter infrastructure dynamic · b40f4757
      Christoph Lameter 提交于
      In practice, each RDMA device has a unique set of counters that the
      hardware implements.  Having a central set of counters that they must
      all adhere to is limiting and causes many useful counters to not be
      available.
      
      Therefore we create a dynamic counter registration infrastructure.
      
      The driver must implement a stats structure allocation routine, in
      which the driver must place the directory name it wants, a list of
      names for all of the counters, an array of u64 counters themselves,
      plus a few generic configuration options.
      
      We then implement a core routine to create a sysfs file for each
      of the named stats elements, and a core routine to retrieve the
      stats when any of the sysfs attribute files are read.
      
      To avoid excessive beating on the stats generation routine in the
      drivers, the core code also caches the stats for a short period of
      time so that someone attempting to read all of the stats in a
      given device's directory will not result in a stats generation
      call per file read.
      
      Future work will attempt to standardize just the shared stats
      elements, and possibly add a method to get the stats via netlink
      in addition to sysfs.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      [ Add caching, make structure names more informative, add i40iw support,
        other significant rewrites from the original patch ]
      b40f4757
  3. 27 4月, 2016 1 次提交
  4. 01 3月, 2016 2 次提交
  5. 24 12月, 2015 2 次提交
  6. 29 10月, 2015 2 次提交
  7. 22 10月, 2015 1 次提交
  8. 31 8月, 2015 1 次提交
  9. 13 6月, 2015 2 次提交
  10. 04 6月, 2015 1 次提交
  11. 02 6月, 2015 1 次提交
  12. 21 5月, 2015 2 次提交
  13. 19 5月, 2015 1 次提交
  14. 23 11月, 2014 1 次提交
  15. 22 7月, 2014 1 次提交
  16. 16 7月, 2014 2 次提交
  17. 11 6月, 2014 1 次提交
    • H
      iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d
      Hariprasad Shenai 提交于
      Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
      RXQs, which also handle direct interrupts for offload CPLs during RDMA
      connection setup/teardown.  The intended T4 usage model, however, is to
      have indirect interrupts flow through dedicated IQs. IE not to mix
      indirect interrupts with CPL messages in an IQ.  This patch adds the
      concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
      LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
      flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
      CIQs.
      
      Design:
      
      cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
      are sized according to the max available CQs available at adapter init.
      In addition, these IQs don't need FL buffers since they only service
      indirect interrupts.  One CIQ is setup per RX channel similar to the
      RDMA RXQs.
      
      iw_cxgb4 will utilize these CIQs based on the vector value passed into
      create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
      number of CIQs configured, and thus the vector value will be the index
      into the array of CIQs.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf38be6d
  18. 06 6月, 2014 1 次提交
    • Y
      RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp · b7dfa889
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding is
      not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_alloc_ucontext_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp {
                __u64                      status_page_key;      /*     0     8 */
                __u32                      status_page_size;     /*     8     4 */
      
        -       /* size: 12, cachelines: 1, members: 2 */
        -       /* last cacheline: 12 bytes */
        +       /* size: 16, cachelines: 1, members: 2 */
        +       /* padding: 4 */
        +       /* last cacheline: 16 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      Additionally, as reported by Dan Carpenter, without the implicit
      padding being properly cleared, an information leak would take place
      in most architectures.
      
      This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp,
      and, like 92b0ca7c ("IB/mlx5: Fix stack info leak in
      mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_alloc_ucontext_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
      Link: http://marc.info/?i=20140328082428.GH25192@mwanda
      Cc: <stable@vger.kernel.org>
      Fixes: 05eb2389 ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
      Reported-by: NYann Droneaud <ydroneaud@opteya.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b7dfa889
  19. 12 4月, 2014 1 次提交
  20. 29 3月, 2014 1 次提交
  21. 15 3月, 2014 1 次提交
    • S
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise 提交于
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05eb2389
  22. 14 3月, 2013 1 次提交
  23. 19 5月, 2012 3 次提交
  24. 06 3月, 2012 1 次提交
  25. 19 7月, 2011 1 次提交
  26. 10 5月, 2011 1 次提交
    • S
      RDMA/cxgb4: EEH errors can hang the driver · 2f25e9a5
      Steve Wise 提交于
      A few more EEH fixes:
      
      c4iw_wait_for_reply(): detect fatal EEH condition on timeout and
      return an error.
      
      The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH
      event followed by a ib_register_device() when the device was
      reinitialized.  However, the RDMA core doesn't allow multiple
      iterations of register/deregister by the provider. See
      drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where
      the kobject ref is held until the device is deallocated in
      ib_deallocate_device().  Calling deregister adds this kobj reference,
      and then a subsequent register call will generate a WARN_ON() from the
      kobject subsystem because the kobject is being initialized but is
      already initialized with the ref held.
      
      So the provider must deregister and dealloc when resetting for an EEH
      event, then alloc/register to re-initialize.  To do this, we cannot
      use the device ptr as our ULD handle since it will change with each
      reallocation.  This commit adds a ULD context struct which is used as
      the ULD handle, and then contains the device pointer and other state
      needed.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2f25e9a5
  27. 23 10月, 2010 1 次提交
  28. 29 9月, 2010 2 次提交
  29. 25 5月, 2010 1 次提交
  30. 22 5月, 2010 1 次提交
    • R
      IB/core: Allow device-specific per-port sysfs files · 9a6edb60
      Ralph Campbell 提交于
      Add a new parameter to ib_register_device() so that low-level device
      drivers can pass in a pointer to a callback function that will be
      called for each port that is registered in sysfs.  This allows
      low-level device drivers to create files in
      
          /sys/class/infiniband/<hca>/ports/<N>/
      
      without having to poke through the internals of the RDMA sysfs handling.
      
      There is no need for an unregister function since the kobject
      reference will go to zero when ib_unregister_device() is called.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9a6edb60
  31. 06 5月, 2010 1 次提交