1. 14 10月, 2011 7 次提交
    • S
      RDMA/uverbs: Export XRC INI QPs to userspace · 9977f4f6
      Sean Hefty 提交于
      XRC INI QPs are similar to send only RC QPs.  Allow user space to create
      INI QPs.  Note that INI QPs do not require receive CQs.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      9977f4f6
    • S
      RDMA/uverbs: Export XRC SRQs to user space · 8541f8de
      Sean Hefty 提交于
      We require additional information to create XRC SRQs than we can
      exchange using the existing create SRQ ABI.  Provide an enhanced create
      ABI for extended SRQ types.
      
      Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
      and Roland Dreier <roland@purestorage.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8541f8de
    • S
      RDMA/uverbs: Export XRC domains to user space · 53d0bd1e
      Sean Hefty 提交于
      Allow user space to create XRC domains.  Because XRCDs are expected to
      be shared among multiple processes, we use inodes to identify an XRCD.
      
      Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      53d0bd1e
    • S
      RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD · d3d72d90
      Sean Hefty 提交于
      XRC TGT QPs are intended to be shared among multiple users and
      processes.  Allow the destruction of an XRC TGT QP to be done explicitly
      through ib_destroy_qp() or when the XRCD is destroyed.
      
      To support destroying an XRC TGT QP, we need to track TGT QPs with the
      XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
      cleaned up.
      
      To avoid stale reference issues, if a user is holding a reference on a
      TGT QP, we increment a reference count on the QP.  The user releases the
      reference by calling ib_release_qp.  This releases any access to the QP
      from a user above verbs, but allows the QP to continue to exist until
      destroyed by the XRCD.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d3d72d90
    • S
      RDMA/core: Add XRC QPs · b42b63cf
      Sean Hefty 提交于
      XRC ("eXtended reliable connected") is an IB transport that provides
      better scalability by allowing senders to specify which shared receive
      queue (SRQ) should be used to receive a message, which essentially
      allows one transport context (QP connection) to serve multiple
      destinations (as long as they share an adapter, of course).
      
      XRC communication is between an initiator (INI) QP and a target (TGT)
      QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
      behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
      QPs, except that work requests posted to an XRC INI QP must specify the
      remote SRQ that is the target of the work request.
      
      We define two new QP types for XRC, to distinguish between INI and TGT
      QPs, and update the core layer to support XRC QPs.
      
      This patch is derived from work by Jack Morgenstein
      <jackm@dev.mellanox.co.il>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b42b63cf
    • S
      RDMA/core: Add XRC SRQ type · 418d5130
      Sean Hefty 提交于
      XRC ("eXtended reliable connected") is an IB transport that provides
      better scalability by allowing senders to specify which shared receive
      queue (SRQ) should be used to receive a message, which essentially
      allows one transport context (QP connection) to serve multiple
      destinations (as long as they share an adapter, of course).
      
      XRC defines SRQs that are specifically used by XRC connections.  Expand
      the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
      only XRC use according to the IB XRC Annex.
      
      Portions of this patch were derived from work by
      Jack Morgenstein <jackm@dev.mellanox.co.il>.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      418d5130
    • S
      RDMA/core: Add SRQ type field · 96104eda
      Sean Hefty 提交于
      Currently, there is only a single ("basic") type of SRQ, but with XRC
      support we will add a second.  Prepare for this by defining an SRQ type
      and setting all current users to IB_SRQT_BASIC.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      96104eda
  2. 13 10月, 2011 1 次提交
    • S
      RDMA/core: Add XRC domain support · 59991f94
      Sean Hefty 提交于
      XRC ("eXtended reliable connected") is an IB transport that provides
      better scalability by allowing senders to specify which shared receive
      queue (SRQ) should be used to receive a message, which essentially
      allows one transport context (QP connection) to serve multiple
      destinations (as long as they share an adapter, of course).
      
      A few new concepts are introduced to support this.  This patch adds:
      
       - A new device capability flag, IB_DEVICE_XRC, which low-level
         drivers set to indicate that a device supports XRC.
       - A new object type, XRC domains (struct ib_xrcd), and new verbs
         ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
         XRC SRQs an incoming message can target.
      
      This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      59991f94
  3. 19 7月, 2011 4 次提交
  4. 18 7月, 2011 1 次提交
  5. 05 7月, 2011 1 次提交
  6. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  7. 26 5月, 2011 4 次提交
    • N
      RDMA/cma: Save PID of ID's owner · 83e9502d
      Nir Muchtar 提交于
      Save the PID associated with an RDMA CM ID for reporting via netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      83e9502d
    • N
      RDMA/cma: Add support for netlink statistics export · 753f618a
      Nir Muchtar 提交于
      Add callbacks and data types for statistics export of all current
      devices/ids.  The schema for RDMA CM is a series of netlink messages.
      Each one contains an rdma_cm_stat struct.  Additionally, two netlink
      attributes are created for the addresses for each message (if
      applicable).
      
      Their types used are:
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
      sockaddr_* structs are encapsulated within these attributes.
      
      In other words, every transaction contains a series of messages like:
      
      -------message 1-------
      struct rdma_cm_id_stats {
             __u32 qp_num;
             __u32 bound_dev_if;
             __u32 port_space;
             __s32 pid;
             __u8 cm_state;
             __u8 node_type;
             __u8 port_num;
             __u8 reserved;
      }
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
      -------end 1-------
      -------message 2-------
      struct rdma_cm_id_stats
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
      -------end 2-------
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      753f618a
    • S
      RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99
      Sean Hefty 提交于
      The RDMA CM currently infers the QP type from the port space selected
      by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
      be a 1-1 correspondence between port space and QP type.  For netlink
      export of RDMA CM state, we want to export the QP type to userspace,
      so it is cleaner to explicitly associate a QP type to an ID.
      
      Modify rdma_create_id() to allow the user to specify the QP type, and
      use it to make our selections of datagram versus connected mode.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b26f9b99
    • N
      RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> · 550e5ca7
      Nir Muchtar 提交于
      Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
      in an exported header so that it can be exported via RDMA netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      550e5ca7
  8. 24 5月, 2011 4 次提交
  9. 21 5月, 2011 2 次提交
  10. 11 5月, 2011 1 次提交
  11. 10 5月, 2011 3 次提交
    • R
      RDMA/iwcm: Get rid of enum iw_cm_event_status · d0c49bf3
      Roland Dreier 提交于
      The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
      cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
      drivers -- only nes was using the enum values (with the mild consequence
      that all nes connection failures were treated as generic errors rather
      than reported as timeouts or rejections).
      
      We can fix this confusion by getting rid of enum iw_cm_event_status and
      using a plain int for struct iw_cm_event.status, and converting nes to
      use -Exxx as the other iWARP drivers do.
      
      This also gets rid of the warning
      
          drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
          drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NFaisal Latif <faisal.latif@intel.com>
      d0c49bf3
    • H
      RDMA/cma: Add an ID_REUSEADDR option · a9bb7912
      Hefty, Sean 提交于
      Lustre requires that clients bind to a privileged port number before
      connecting to a remote server.  On larger clusters (typically more
      than about 1000 nodes), the number of privileged ports is exhausted,
      resulting in lustre being unusable.
      
      To handle this, we add support for reusable addresses to the rdma_cm.
      This mimics the behavior of the socket option SO_REUSEADDR.  A user
      may set an rdma_cm_id to reuse an address before calling
      rdma_bind_addr() (explicitly or implicitly).  If set, other
      rdma_cm_id's may be bound to the same address, provided that they all
      have reuse enabled, and there are no active listens.
      
      If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
      will only succeed if there are no other id's bound to that same
      address.  The reuse option is exported to user space.  The behavior of
      the kernel reuse implementation was verified against that given by
      sockets.
      
      This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a9bb7912
    • H
      RDMA/cma: Fix handling of IPv6 addressing in cma_use_port · 43b752da
      Hefty, Sean 提交于
      cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
      IPv6 addressing is supported (and also to support other address
      families) make the code more generic in its address handling.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      43b752da
  12. 19 3月, 2011 1 次提交
  13. 18 3月, 2011 1 次提交
  14. 16 3月, 2011 4 次提交
    • S
      RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one · a396d43a
      Sean Hefty 提交于
      rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
      rdma_cm_id has been bound to a device.  This prevents an active
      address resolution callback handler from assigning a device to the
      rdma_cm_id after rdma_destroy_id checks for one.
      
      Instead, we can replace the use of the global lock around the check to
      the rdma_cm_id device pointer by setting the id state to destroying,
      then flushing all active callbacks.  The latter is accomplished by
      acquiring and releasing the handler_mutex.  Any active handler will
      complete first, and any newly scheduled handlers will find the
      rdma_cm_id in an invalid state.
      
      In addition to optimizing the current locking scheme, the use of the
      rdma_cm_id mutex is a more intuitive synchronization mechanism than
      that of the global lock.  These changes are based on feedback from
      Doug Ledford <dledford@redhat.com> while he was trying to debug a
      crash in the rdma cm destroy path.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a396d43a
    • S
      IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865
      Sean Hefty 提交于
      This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
      Vadai <amirv@mellanox.com>:
      
      	When destroying a cm_id from a context of a work queue and if
      	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
      	release the reference of this id that was taken upon the send
      	of the LAP message.  Otherwise, if the expected APR message
      	gets lost, it is only after a long time that the reference
      	will be released, while during that the work handler thread is
      	not available to process other things.
      
      It turns out that we need to cancel any pending LAP messages whenever
      we transition out of the IB_CM_ESTABLISH state.  This occurs when
      disconnecting - either sending or receiving a DREQ.  It can also
      happen in a corner case where we receive a REJ message after sending
      an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
      messages in these three cases.
      
      Canceling the LAP when sending a DREQ fixes the destroy problem
      reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
      state, it sends a DREQ to the remote side to notify the peer that the
      connection is going away.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8d8ac865
    • S
      IB/cm: Bump reference count on cm_id before invoking callback · 29963437
      Sean Hefty 提交于
      When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
      refcount of the cm_id is initialized to 1.  However, cm_process_work
      will decrement the refcount after invoking all callbacks.  The result
      is that the cm_id will end up with refcount set to 0 by the end of the
      sidr req handler.
      
      If a user tries to destroy the cm_id, the destruction will proceed,
      under the incorrect assumption that no other threads are referencing
      the cm_id.  This can lead to a crash when the cm callback thread tries
      to access the cm_id.
      
      This problem was noticed as part of a larger investigation with kernel
      crashes in the rdma_cm when running on a real time OS.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29963437
    • S
      RDMA/cma: Fix crash in request handlers · 25ae21a1
      Sean Hefty 提交于
      Doug Ledford and Red Hat reported a crash when running the rdma_cm on
      a real-time OS.  The crash has the following call trace:
      
          cm_process_work
             cma_req_handler
                cma_disable_callback
                rdma_create_id
                   kzalloc
                   init_completion
                cma_get_net_info
                cma_save_net_info
                cma_any_addr
                   cma_zero_addr
                rdma_translate_ip
                   rdma_copy_addr
                cma_acquire_dev
                   rdma_addr_get_sgid
                   ib_find_cached_gid
                   cma_attach_to_dev
                ucma_event_handler
                   kzalloc
                   ib_copy_ah_attr_to_user
                cma_comp
      
      [ preempted ]
      
          cma_write
              copy_from_user
              ucma_destroy_id
                 copy_from_user
                 _ucma_find_context
                 ucma_put_ctx
                 ucma_free_ctx
                    rdma_destroy_id
                       cma_exch
                       cma_cancel_operation
                       rdma_node_get_transport
      
              rt_mutex_slowunlock
              bad_area_nosemaphore
              oops_enter
      
      They were able to reproduce the crash multiple times with the
      following details:
      
          Crash seems to always happen on the:
                  mutex_unlock(&conn_id->handler_mutex);
          as conn_id looks to have been freed during this code path.
      
      An examination of the code shows that a race exists in the request
      handlers.  When a new connection request is received, the rdma_cm
      allocates a new connection identifier.  This identifier has a single
      reference count on it.  If a user calls rdma_destroy_id() from another
      thread after receiving a callback, rdma_destroy_id will proceed to
      destroy the id and free the associated memory.  However, the request
      handlers may still be in the process of running.  When control returns
      to the request handlers, they can attempt to access the newly created
      identifiers.
      
      Fix this by holding a reference on the newly created rdma_cm_id until
      the request handler is through accessing it.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      25ae21a1
  15. 13 3月, 2011 3 次提交
  16. 03 3月, 2011 1 次提交
  17. 29 1月, 2011 1 次提交