1. 13 10月, 2011 1 次提交
    • S
      RDMA/core: Add XRC domain support · 59991f94
      Sean Hefty 提交于
      XRC ("eXtended reliable connected") is an IB transport that provides
      better scalability by allowing senders to specify which shared receive
      queue (SRQ) should be used to receive a message, which essentially
      allows one transport context (QP connection) to serve multiple
      destinations (as long as they share an adapter, of course).
      
      A few new concepts are introduced to support this.  This patch adds:
      
       - A new device capability flag, IB_DEVICE_XRC, which low-level
         drivers set to indicate that a device supports XRC.
       - A new object type, XRC domains (struct ib_xrcd), and new verbs
         ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
         XRC SRQs an incoming message can target.
      
      This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      59991f94
  2. 19 7月, 2011 4 次提交
  3. 18 7月, 2011 1 次提交
  4. 05 7月, 2011 1 次提交
  5. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  6. 26 5月, 2011 4 次提交
    • N
      RDMA/cma: Save PID of ID's owner · 83e9502d
      Nir Muchtar 提交于
      Save the PID associated with an RDMA CM ID for reporting via netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      83e9502d
    • N
      RDMA/cma: Add support for netlink statistics export · 753f618a
      Nir Muchtar 提交于
      Add callbacks and data types for statistics export of all current
      devices/ids.  The schema for RDMA CM is a series of netlink messages.
      Each one contains an rdma_cm_stat struct.  Additionally, two netlink
      attributes are created for the addresses for each message (if
      applicable).
      
      Their types used are:
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
      sockaddr_* structs are encapsulated within these attributes.
      
      In other words, every transaction contains a series of messages like:
      
      -------message 1-------
      struct rdma_cm_id_stats {
             __u32 qp_num;
             __u32 bound_dev_if;
             __u32 port_space;
             __s32 pid;
             __u8 cm_state;
             __u8 node_type;
             __u8 port_num;
             __u8 reserved;
      }
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
      -------end 1-------
      -------message 2-------
      struct rdma_cm_id_stats
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
      -------end 2-------
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      753f618a
    • S
      RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99
      Sean Hefty 提交于
      The RDMA CM currently infers the QP type from the port space selected
      by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
      be a 1-1 correspondence between port space and QP type.  For netlink
      export of RDMA CM state, we want to export the QP type to userspace,
      so it is cleaner to explicitly associate a QP type to an ID.
      
      Modify rdma_create_id() to allow the user to specify the QP type, and
      use it to make our selections of datagram versus connected mode.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b26f9b99
    • N
      RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> · 550e5ca7
      Nir Muchtar 提交于
      Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
      in an exported header so that it can be exported via RDMA netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      550e5ca7
  7. 24 5月, 2011 4 次提交
  8. 21 5月, 2011 2 次提交
  9. 11 5月, 2011 1 次提交
  10. 10 5月, 2011 3 次提交
    • R
      RDMA/iwcm: Get rid of enum iw_cm_event_status · d0c49bf3
      Roland Dreier 提交于
      The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
      cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
      drivers -- only nes was using the enum values (with the mild consequence
      that all nes connection failures were treated as generic errors rather
      than reported as timeouts or rejections).
      
      We can fix this confusion by getting rid of enum iw_cm_event_status and
      using a plain int for struct iw_cm_event.status, and converting nes to
      use -Exxx as the other iWARP drivers do.
      
      This also gets rid of the warning
      
          drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
          drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NFaisal Latif <faisal.latif@intel.com>
      d0c49bf3
    • H
      RDMA/cma: Add an ID_REUSEADDR option · a9bb7912
      Hefty, Sean 提交于
      Lustre requires that clients bind to a privileged port number before
      connecting to a remote server.  On larger clusters (typically more
      than about 1000 nodes), the number of privileged ports is exhausted,
      resulting in lustre being unusable.
      
      To handle this, we add support for reusable addresses to the rdma_cm.
      This mimics the behavior of the socket option SO_REUSEADDR.  A user
      may set an rdma_cm_id to reuse an address before calling
      rdma_bind_addr() (explicitly or implicitly).  If set, other
      rdma_cm_id's may be bound to the same address, provided that they all
      have reuse enabled, and there are no active listens.
      
      If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
      will only succeed if there are no other id's bound to that same
      address.  The reuse option is exported to user space.  The behavior of
      the kernel reuse implementation was verified against that given by
      sockets.
      
      This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a9bb7912
    • H
      RDMA/cma: Fix handling of IPv6 addressing in cma_use_port · 43b752da
      Hefty, Sean 提交于
      cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
      IPv6 addressing is supported (and also to support other address
      families) make the code more generic in its address handling.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      43b752da
  11. 19 3月, 2011 1 次提交
  12. 18 3月, 2011 1 次提交
  13. 16 3月, 2011 4 次提交
    • S
      RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one · a396d43a
      Sean Hefty 提交于
      rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
      rdma_cm_id has been bound to a device.  This prevents an active
      address resolution callback handler from assigning a device to the
      rdma_cm_id after rdma_destroy_id checks for one.
      
      Instead, we can replace the use of the global lock around the check to
      the rdma_cm_id device pointer by setting the id state to destroying,
      then flushing all active callbacks.  The latter is accomplished by
      acquiring and releasing the handler_mutex.  Any active handler will
      complete first, and any newly scheduled handlers will find the
      rdma_cm_id in an invalid state.
      
      In addition to optimizing the current locking scheme, the use of the
      rdma_cm_id mutex is a more intuitive synchronization mechanism than
      that of the global lock.  These changes are based on feedback from
      Doug Ledford <dledford@redhat.com> while he was trying to debug a
      crash in the rdma cm destroy path.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a396d43a
    • S
      IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865
      Sean Hefty 提交于
      This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
      Vadai <amirv@mellanox.com>:
      
      	When destroying a cm_id from a context of a work queue and if
      	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
      	release the reference of this id that was taken upon the send
      	of the LAP message.  Otherwise, if the expected APR message
      	gets lost, it is only after a long time that the reference
      	will be released, while during that the work handler thread is
      	not available to process other things.
      
      It turns out that we need to cancel any pending LAP messages whenever
      we transition out of the IB_CM_ESTABLISH state.  This occurs when
      disconnecting - either sending or receiving a DREQ.  It can also
      happen in a corner case where we receive a REJ message after sending
      an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
      messages in these three cases.
      
      Canceling the LAP when sending a DREQ fixes the destroy problem
      reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
      state, it sends a DREQ to the remote side to notify the peer that the
      connection is going away.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8d8ac865
    • S
      IB/cm: Bump reference count on cm_id before invoking callback · 29963437
      Sean Hefty 提交于
      When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
      refcount of the cm_id is initialized to 1.  However, cm_process_work
      will decrement the refcount after invoking all callbacks.  The result
      is that the cm_id will end up with refcount set to 0 by the end of the
      sidr req handler.
      
      If a user tries to destroy the cm_id, the destruction will proceed,
      under the incorrect assumption that no other threads are referencing
      the cm_id.  This can lead to a crash when the cm callback thread tries
      to access the cm_id.
      
      This problem was noticed as part of a larger investigation with kernel
      crashes in the rdma_cm when running on a real time OS.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29963437
    • S
      RDMA/cma: Fix crash in request handlers · 25ae21a1
      Sean Hefty 提交于
      Doug Ledford and Red Hat reported a crash when running the rdma_cm on
      a real-time OS.  The crash has the following call trace:
      
          cm_process_work
             cma_req_handler
                cma_disable_callback
                rdma_create_id
                   kzalloc
                   init_completion
                cma_get_net_info
                cma_save_net_info
                cma_any_addr
                   cma_zero_addr
                rdma_translate_ip
                   rdma_copy_addr
                cma_acquire_dev
                   rdma_addr_get_sgid
                   ib_find_cached_gid
                   cma_attach_to_dev
                ucma_event_handler
                   kzalloc
                   ib_copy_ah_attr_to_user
                cma_comp
      
      [ preempted ]
      
          cma_write
              copy_from_user
              ucma_destroy_id
                 copy_from_user
                 _ucma_find_context
                 ucma_put_ctx
                 ucma_free_ctx
                    rdma_destroy_id
                       cma_exch
                       cma_cancel_operation
                       rdma_node_get_transport
      
              rt_mutex_slowunlock
              bad_area_nosemaphore
              oops_enter
      
      They were able to reproduce the crash multiple times with the
      following details:
      
          Crash seems to always happen on the:
                  mutex_unlock(&conn_id->handler_mutex);
          as conn_id looks to have been freed during this code path.
      
      An examination of the code shows that a race exists in the request
      handlers.  When a new connection request is received, the rdma_cm
      allocates a new connection identifier.  This identifier has a single
      reference count on it.  If a user calls rdma_destroy_id() from another
      thread after receiving a callback, rdma_destroy_id will proceed to
      destroy the id and free the associated memory.  However, the request
      handlers may still be in the process of running.  When control returns
      to the request handlers, they can attempt to access the newly created
      identifiers.
      
      Fix this by holding a reference on the newly created rdma_cm_id until
      the request handler is through accessing it.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      25ae21a1
  14. 13 3月, 2011 3 次提交
  15. 03 3月, 2011 1 次提交
  16. 29 1月, 2011 2 次提交
  17. 17 1月, 2011 1 次提交
    • T
      RDMA: Update workqueue usage · f0626710
      Tejun Heo 提交于
      * ib_wq is added, which is used as the common workqueue for infiniband
        instead of the system workqueue.  All system workqueue usages
        including flush_scheduled_work() callers are converted to use and
        flush ib_wq.
      
      * cancel_delayed_work() + flush_scheduled_work() converted to
        cancel_delayed_work_sync().
      
      * qib_wq is removed and ib_wq is used instead.
      
      This is to prepare for deprecation of flush_scheduled_work().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f0626710
  18. 09 12月, 2010 1 次提交
    • D
      IB/uverbs: Handle large number of entries in poll CQ · 7182afea
      Dan Carpenter 提交于
      In ib_uverbs_poll_cq() code there is a potential integer overflow if
      userspace passes in a large cmd.ne.  The calls to kmalloc() would
      allocate smaller buffers than intended, leading to memory corruption.
      There iss also an information leak if resp wasn't all used.
      Unprivileged userspace may call this function, although only if an
      RDMA device that uses this function is present.
      
      Fix this by copying CQ entries one at a time, which avoids the
      allocation entirely, and also by moving this copying into a function
      that makes sure to initialize all memory copied to userspace.
      
      Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      for his help and advice.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      
      [ Monkey around with things a bit to avoid bad code generation by gcc
        when designated initializers are used.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7182afea
  19. 02 12月, 2010 2 次提交
  20. 25 11月, 2010 1 次提交
  21. 12 11月, 2010 1 次提交
    • E
      net: get rid of rtable->idev · 72cdd1d9
      Eric Dumazet 提交于
      It seems idev field in struct rtable has no special purpose, but adding
      extra atomic ops.
      
      We hold refcounts on the device itself (using percpu data, so pretty
      cheap in current kernel).
      
      infiniband case is solved using dst.dev instead of idev->dev
      
      Removal of this field means routing without route cache is now using
      shared data, percpu data, and only potential contention is a pair of
      atomic ops on struct neighbour per forwarded packet.
      
      About 5% speedup on routing test.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72cdd1d9