1. 07 10月, 2011 2 次提交
  2. 19 7月, 2011 2 次提交
  3. 26 5月, 2011 4 次提交
    • N
      RDMA/cma: Save PID of ID's owner · 83e9502d
      Nir Muchtar 提交于
      Save the PID associated with an RDMA CM ID for reporting via netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      83e9502d
    • N
      RDMA/cma: Add support for netlink statistics export · 753f618a
      Nir Muchtar 提交于
      Add callbacks and data types for statistics export of all current
      devices/ids.  The schema for RDMA CM is a series of netlink messages.
      Each one contains an rdma_cm_stat struct.  Additionally, two netlink
      attributes are created for the addresses for each message (if
      applicable).
      
      Their types used are:
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
      sockaddr_* structs are encapsulated within these attributes.
      
      In other words, every transaction contains a series of messages like:
      
      -------message 1-------
      struct rdma_cm_id_stats {
             __u32 qp_num;
             __u32 bound_dev_if;
             __u32 port_space;
             __s32 pid;
             __u8 cm_state;
             __u8 node_type;
             __u8 port_num;
             __u8 reserved;
      }
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
      -------end 1-------
      -------message 2-------
      struct rdma_cm_id_stats
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
      -------end 2-------
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      753f618a
    • S
      RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99
      Sean Hefty 提交于
      The RDMA CM currently infers the QP type from the port space selected
      by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
      be a 1-1 correspondence between port space and QP type.  For netlink
      export of RDMA CM state, we want to export the QP type to userspace,
      so it is cleaner to explicitly associate a QP type to an ID.
      
      Modify rdma_create_id() to allow the user to specify the QP type, and
      use it to make our selections of datagram versus connected mode.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b26f9b99
    • N
      RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> · 550e5ca7
      Nir Muchtar 提交于
      Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
      in an exported header so that it can be exported via RDMA netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      550e5ca7
  4. 10 5月, 2011 2 次提交
    • H
      RDMA/cma: Add an ID_REUSEADDR option · a9bb7912
      Hefty, Sean 提交于
      Lustre requires that clients bind to a privileged port number before
      connecting to a remote server.  On larger clusters (typically more
      than about 1000 nodes), the number of privileged ports is exhausted,
      resulting in lustre being unusable.
      
      To handle this, we add support for reusable addresses to the rdma_cm.
      This mimics the behavior of the socket option SO_REUSEADDR.  A user
      may set an rdma_cm_id to reuse an address before calling
      rdma_bind_addr() (explicitly or implicitly).  If set, other
      rdma_cm_id's may be bound to the same address, provided that they all
      have reuse enabled, and there are no active listens.
      
      If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
      will only succeed if there are no other id's bound to that same
      address.  The reuse option is exported to user space.  The behavior of
      the kernel reuse implementation was verified against that given by
      sockets.
      
      This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a9bb7912
    • H
      RDMA/cma: Fix handling of IPv6 addressing in cma_use_port · 43b752da
      Hefty, Sean 提交于
      cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
      IPv6 addressing is supported (and also to support other address
      families) make the code more generic in its address handling.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      43b752da
  5. 16 3月, 2011 2 次提交
    • S
      RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one · a396d43a
      Sean Hefty 提交于
      rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
      rdma_cm_id has been bound to a device.  This prevents an active
      address resolution callback handler from assigning a device to the
      rdma_cm_id after rdma_destroy_id checks for one.
      
      Instead, we can replace the use of the global lock around the check to
      the rdma_cm_id device pointer by setting the id state to destroying,
      then flushing all active callbacks.  The latter is accomplished by
      acquiring and releasing the handler_mutex.  Any active handler will
      complete first, and any newly scheduled handlers will find the
      rdma_cm_id in an invalid state.
      
      In addition to optimizing the current locking scheme, the use of the
      rdma_cm_id mutex is a more intuitive synchronization mechanism than
      that of the global lock.  These changes are based on feedback from
      Doug Ledford <dledford@redhat.com> while he was trying to debug a
      crash in the rdma cm destroy path.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a396d43a
    • S
      RDMA/cma: Fix crash in request handlers · 25ae21a1
      Sean Hefty 提交于
      Doug Ledford and Red Hat reported a crash when running the rdma_cm on
      a real-time OS.  The crash has the following call trace:
      
          cm_process_work
             cma_req_handler
                cma_disable_callback
                rdma_create_id
                   kzalloc
                   init_completion
                cma_get_net_info
                cma_save_net_info
                cma_any_addr
                   cma_zero_addr
                rdma_translate_ip
                   rdma_copy_addr
                cma_acquire_dev
                   rdma_addr_get_sgid
                   ib_find_cached_gid
                   cma_attach_to_dev
                ucma_event_handler
                   kzalloc
                   ib_copy_ah_attr_to_user
                cma_comp
      
      [ preempted ]
      
          cma_write
              copy_from_user
              ucma_destroy_id
                 copy_from_user
                 _ucma_find_context
                 ucma_put_ctx
                 ucma_free_ctx
                    rdma_destroy_id
                       cma_exch
                       cma_cancel_operation
                       rdma_node_get_transport
      
              rt_mutex_slowunlock
              bad_area_nosemaphore
              oops_enter
      
      They were able to reproduce the crash multiple times with the
      following details:
      
          Crash seems to always happen on the:
                  mutex_unlock(&conn_id->handler_mutex);
          as conn_id looks to have been freed during this code path.
      
      An examination of the code shows that a race exists in the request
      handlers.  When a new connection request is received, the rdma_cm
      allocates a new connection identifier.  This identifier has a single
      reference count on it.  If a user calls rdma_destroy_id() from another
      thread after receiving a callback, rdma_destroy_id will proceed to
      destroy the id and free the associated memory.  However, the request
      handlers may still be in the process of running.  When control returns
      to the request handlers, they can attempt to access the newly created
      identifiers.
      
      Fix this by holding a reference on the newly created rdma_cm_id until
      the request handler is through accessing it.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      25ae21a1
  6. 26 10月, 2010 1 次提交
    • E
      IB/core: Add VLAN support for IBoE · af7bd463
      Eli Cohen 提交于
      Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
      GID derived from a link local address in the following way:
      
          GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.
      
      The 3 bits user priority field of the packets are identical to the 3
      bits of the SL.
      
      In case of rdma_cm apps, the TOS field is used to generate the SL
      field by doing a shift right of 5 bits effectively taking to 3 MS bits
      of the TOS field.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      af7bd463
  7. 14 10月, 2010 1 次提交
  8. 16 5月, 2010 1 次提交
  9. 22 4月, 2010 1 次提交
  10. 08 4月, 2010 1 次提交
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 11 2月, 2010 1 次提交
    • S
      RDMA/cm: Revert association of an RDMA device when binding to loopback · 8523c048
      Sean Hefty 提交于
      Revert the following change from commit 6f8372b6 ("RDMA/cm: fix
      loopback address support")
      
         The defined behavior of rdma_bind_addr is to associate an RDMA
         device with an rdma_cm_id, as long as the user specified a non-
         zero address.  (ie they weren't just trying to reserve a port)
         Currently, if the loopback address is passed to rdma_bind_addr,
         no device is associated with the rdma_cm_id.  Fix this.
      
      It turns out that important apps such as Open MPI depend on
      rdma_bind_addr() NOT associating any RDMA device when binding to a
      loopback address.  Open MPI is being updated to deal with this, but at
      least until a new Open MPI release is available, maintain the previous
      behavior: allow rdma_bind_addr() to succeed, but do not bind to a
      device.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      8523c048
  13. 07 1月, 2010 1 次提交
  14. 20 11月, 2009 6 次提交
    • S
      IB/addr: Fix IPv6 routing lookup · d14714df
      Sean Hefty 提交于
      Include link scope as part of address resolution.  Combine local
      and remote address resolution into a single, simpler code path.
      Fix error checking in the IPv6 routing lookups.
      
      Based on work from:
      David Wilder <dwilder@us.ibm.com>
      Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      
      [ Fix up cma_check_linklocal() for !IPV6 case.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d14714df
    • S
      RDMA/cm: fix loopback address support · 6f8372b6
      Sean Hefty 提交于
      The RDMA CM is intended to support the use of a loopback address
      when establishing a connection; however, the behavior of the CM
      when loopback addresses are used is confusing and does not always
      work, depending on whether loopback was specified by the server,
      the client, or both.
      
      The defined behavior of rdma_bind_addr is to associate an RDMA
      device with an rdma_cm_id, as long as the user specified a non-
      zero address.  (ie they weren't just trying to reserve a port)
      Currently, if the loopback address is passed to rdam_bind_addr,
      no device is associated with the rdma_cm_id.  Fix this.
      
      If a loopback address is specified by the client as the destination
      address for a connection, it will fail to establish a connection.
      This is true even if the server is listing across all addresses or
      on the loopback address itself.  The issue is that the server tries
      to translate the IP address carried in the REQ message to a local
      net_device address, which fails.  The translation is not needed in
      this case, since the REQ carries the actual HW address that should
      be used.
      
      Finally, cleanup loopback support to be more transport neutral.
      Replace separate calls to get/set the sgid and dgid from the
      device address to a single call that behaves correctly depending
      on the format of the device address.  And support both IPv4 and
      IPv6 address formats.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      
      [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6f8372b6
    • S
      IB/addr: Store net_device type instead of translating to RDMA transport · c4315d85
      Sean Hefty 提交于
      The struct rdma_dev_addr stores net_device address information:
      the source device address, destination hardware address, and
      broadcast address.  For consistency, store the net_device type
      rather than converting it to the rdma_node_type.
      
      The type indicates the format of the various hardware addresses,
      which is what we're concerned with, and not the RDMA node type
      that the address may map to.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c4315d85
    • S
      RDMA/cma: Replace net_device pointer with index · 6266ed6e
      Sean Hefty 提交于
      Provide the device interface when resolving route information to
      ensure that the correct outbound device is used.  This will also
      simplify processing of sin6_scope_id for IPv6 support.
      
      Based on work from:
      David Wilder <dwilder@us.ibm.com>
      Jason Gunthorpe <jgunthrope@obsidianresearch.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6266ed6e
    • J
      RDMA/cma: Fix AF_INET6 support in multicast joining · e2e62697
      Jason Gunthorpe 提交于
      If joining to an AF_INET6 address, we need to map the address to a MGID
      in the same way as the IP stack.  The old code would just fall through to
      the IPv4 case and generate garbage.
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e2e62697
    • J
      RDMA/cma: Correct detection of SA Created MGID · 1c9b2819
      Jason Gunthorpe 提交于
      RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with
      FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy;
      fix it up.
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1c9b2819
  15. 24 6月, 2009 1 次提交
  16. 09 4月, 2009 1 次提交
    • Y
      RDMA/cma: Create cm id even when IB port is down · d2ca39f2
      Yossi Etigin 提交于
      When doing rdma_resolve_addr(), if the relevant IB port is down, the
      function fails and the cm_id is not bound to the correct device.
      Therefore, application does not have a device handle and cannot wait
      for the port to become active.  The function fails because the
      underlying IPoIB interface is not joined to the broadcast group and
      therefore the SA does not have a multicast record to take a Q_Key
      from.
      
      The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
      id_priv->qkey if it was not set, and will be called just before the
      Q_Key is really required.
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Acked-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d2ca39f2
  17. 02 4月, 2009 1 次提交
  18. 25 12月, 2008 1 次提交
  19. 05 8月, 2008 1 次提交
    • R
      RDMA/cma: Remove padding arrays by using struct sockaddr_storage · 3f446754
      Roland Dreier 提交于
      There are a few places where the RDMA CM code handles IPv6 by doing
      
      	struct sockaddr		addr;
      	u8			pad[sizeof(struct sockaddr_in6) -
      				    sizeof(struct sockaddr)];
      
      This is fragile and ugly; handle this in a better way with just
      
      	struct sockaddr_storage	addr;
      
      [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
        switch to struct sockaddr_storage and get rid of padding arrays in
        struct rdma_addr. ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3f446754
  20. 23 7月, 2008 2 次提交
  21. 15 7月, 2008 4 次提交
  22. 17 4月, 2008 1 次提交
  23. 31 3月, 2008 1 次提交
  24. 15 2月, 2008 1 次提交
    • S
      RDMA/cma: Do not issue MRA if user rejects connection request · ead595ae
      Sean Hefty 提交于
      There's an undesirable interaction with issuing MRA requests to
      increase connection timeouts and the listen backlog.
      
      When the rdma_cm receives a connection request, it queues an MRA with
      the ib_cm.  (The ib_cm will send an MRA if it receives a duplicate
      REQ.)  The rdma_cm will then create a new rdma_cm_id and give that to
      the user, which in this case is the rdma_user_cm.
      
      If the listen backlog maintained in the rdma_user_cm is full, it
      destroys the rdma_cm_id, which in turns destroys the ib_cm_id.  The
      ib_cm_id generates a REJ because the state of the ib_cm_id has changed
      to MRA sent, versus REQ received.  When the backlog is full, we just
      want to drop the REQ so that it is retried later.
      
      Fix this by deferring queuing the MRA until after the user of the
      rdma_cm has examined the connection request.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ead595ae