1. 18 6月, 2011 2 次提交
  2. 26 5月, 2011 4 次提交
    • N
      RDMA/cma: Save PID of ID's owner · 83e9502d
      Nir Muchtar 提交于
      Save the PID associated with an RDMA CM ID for reporting via netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      83e9502d
    • N
      RDMA/cma: Add support for netlink statistics export · 753f618a
      Nir Muchtar 提交于
      Add callbacks and data types for statistics export of all current
      devices/ids.  The schema for RDMA CM is a series of netlink messages.
      Each one contains an rdma_cm_stat struct.  Additionally, two netlink
      attributes are created for the addresses for each message (if
      applicable).
      
      Their types used are:
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
      sockaddr_* structs are encapsulated within these attributes.
      
      In other words, every transaction contains a series of messages like:
      
      -------message 1-------
      struct rdma_cm_id_stats {
             __u32 qp_num;
             __u32 bound_dev_if;
             __u32 port_space;
             __s32 pid;
             __u8 cm_state;
             __u8 node_type;
             __u8 port_num;
             __u8 reserved;
      }
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
      -------end 1-------
      -------message 2-------
      struct rdma_cm_id_stats
      RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
      RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
      -------end 2-------
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      753f618a
    • S
      RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99
      Sean Hefty 提交于
      The RDMA CM currently infers the QP type from the port space selected
      by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
      be a 1-1 correspondence between port space and QP type.  For netlink
      export of RDMA CM state, we want to export the QP type to userspace,
      so it is cleaner to explicitly associate a QP type to an ID.
      
      Modify rdma_create_id() to allow the user to specify the QP type, and
      use it to make our selections of datagram versus connected mode.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b26f9b99
    • N
      RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> · 550e5ca7
      Nir Muchtar 提交于
      Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
      in an exported header so that it can be exported via RDMA netlink.
      Signed-off-by: NNir Muchtar <nirm@voltaire.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      550e5ca7
  3. 25 5月, 2011 3 次提交
    • L
      RDMA/nes: Add a check for strict_strtoul() · 52f81dba
      Liu Yuan 提交于
      It should check if strict_strtoul() succeeds before using
      'wqm_quanta_value'.
      Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>
      
      [ Convert to kstrtoul() directly while we're here.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      52f81dba
    • S
      RDMA/cxgb3: Don't post zero-byte read if endpoint is going away · 80783868
      Steve Wise 提交于
      tx_ack() wasn't checking the endpoint state and consequently would
      attempt to post the p2p 0B read on an endpoint/QP that is closing or
      aborting.  This causes a NULL pointer dereference crash.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      80783868
    • S
      RDMA/cxgb4: Use completion objects for event blocking · c337374b
      Steve Wise 提交于
      There exists a race condition when using wait_queue_head_t objects
      that are declared on the stack.  This was being done in a few places
      where we are sending work requests to the FW and awaiting replies, but
      we don't have an endpoint structure with an embedded c4iw_wr_wait
      struct.  So the code was allocating it locally on the stack.  Bad
      design.  The race is:
      
        1) thread on cpuX declares the wait_queue_head_t on the stack, then
           posts a firmware WR with that wait object ptr as the cookie to be
           returned in the WR reply.  This thread will proceed to block in
           wait_event_timeout() but before it does:
      
        2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
           this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
           variable in the c4iw_wr_wait object to TRUE and will call
           wake_up(), but before it calls wake_up():
      
        3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
           wait_event_timeout().  The wait_event_timeout() macro checks the
           condition variable and returns immediately since it is TRUE.  So
           this thread never blocks/sleeps. The function then returns
           effectively deallocating the c4iw_wr_wait object that was on the
           stack.
      
        4) So at this point cpuY has a pointer to the c4iw_wr_wait object
           that is no longer valid.  Further its pointing to a stack frame
           that might now be in use by some other context/thread.  So cpuY
           continues execution and calls wake_up() on a ptr to a wait object
           that as been effectively deallocated.
      
      This race, when it hits, can cause a crash in wake_up(), which I've
      seen under heavy stress. It can also corrupt the referenced stack
      which can cause any number of failures.
      
      The fix:
      
      Use struct completion, which supports on-stack declarations.
      Completions use a spinlock around setting the condition to true and
      the wake up so that steps 2 and 4 above are atomic and step 3 can
      never happen in-between.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      c337374b
  4. 24 5月, 2011 5 次提交
  5. 23 5月, 2011 1 次提交
    • P
      Add appropriate <linux/prefetch.h> include for prefetch users · 70c71606
      Paul Gortmaker 提交于
      After discovering that wide use of prefetch on modern CPUs
      could be a net loss instead of a win, net drivers which were
      relying on the implicit inclusion of prefetch.h via the list
      headers showed up in the resulting cleanup fallout.  Give
      them an explicit include via the following $0.02 script.
      
       =========================================
       #!/bin/bash
       MANUAL=""
       for i in `git grep -l 'prefetch(.*)' .` ; do
       	grep -q '<linux/prefetch.h>' $i
       	if [ $? = 0 ] ; then
       		continue
       	fi
      
       	(	echo '?^#include <linux/?a'
       		echo '#include <linux/prefetch.h>'
       		echo .
       		echo w
       		echo q
       	) | ed -s $i > /dev/null 2>&1
       	if [ $? != 0 ]; then
       		echo $i needs manual fixup
       		MANUAL="$i $MANUAL"
       	fi
       done
       echo ------------------- 8\<----------------------
       echo vi $MANUAL
       =========================================
      Signed-off-by: NPaul <paul.gortmaker@windriver.com>
      [ Fixed up some incorrect #include placements, and added some
        non-network drivers and the fib_trie.c case    - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70c71606
  6. 21 5月, 2011 2 次提交
  7. 12 5月, 2011 1 次提交
  8. 11 5月, 2011 1 次提交
  9. 10 5月, 2011 10 次提交
  10. 04 5月, 2011 1 次提交
  11. 30 4月, 2011 1 次提交
    • D
      ethtool: cosmetic: Use ethtool ethtool_cmd_speed API · 70739497
      David Decotigny 提交于
      This updates the network drivers so that they don't access the
      ethtool_cmd::speed field directly, but use ethtool_cmd_speed()
      instead.
      
      For most of the drivers, these changes are purely cosmetic and don't
      fix any problem, such as for those 1GbE/10GbE drivers that indirectly
      call their own ethtool get_settings()/mii_ethtool_gset(). The changes
      are meant to enforce code consistency and provide robustness with
      future larger throughputs, at the expense of a few CPU cycles for each
      ethtool operation.
      
      All drivers compiled with make allyesconfig ion x86_64 have been
      updated.
      
      Tested: make allyesconfig on x86_64 + e1000e/bnx2x work
      Signed-off-by: NDavid Decotigny <decot@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70739497
  12. 27 4月, 2011 2 次提交
  13. 20 4月, 2011 2 次提交
  14. 31 3月, 2011 1 次提交
  15. 25 3月, 2011 1 次提交
  16. 24 3月, 2011 2 次提交
  17. 23 3月, 2011 1 次提交
    • D
      IB: Increase DMA max_segment_size on Mellanox hardware · 7f9e5c48
      David Dillow 提交于
      By default, each device is assumed to be able only handle 64 KB chunks
      during DMA. By giving the segment size a larger value, the block layer
      will coalesce more S/G entries together for SRP, allowing larger
      requests with the same sg_tablesize setting.  The block layer is the
      only direct user of it, though a few IOMMU drivers reference it as
      well for their *_map_sg coalescing code. pci-gart_64 on x86, and a
      smattering on on sparc, powerpc, and ia64.
      
      Since other IB protocols could potentially see larger segments with
      this, let's check those:
      
       - iSER is fine, because you limit your maximum request size to 512
         KB, so we'll never overrun the page vector in struct iser_page_vec
         (128 entries currently). It is independent of the DMA segment size,
         and handles multi-page segments already.
      
       - IPoIB is fine, as it maps each page individually, and doesn't use
         ib_dma_map_sg().
      
       - RDS appears to do the right thing and has no dependencies on DMA
         segment size, but I don't claim to have done a complete audit.
      
       - NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they
         doesn't care about the coalescing.
      
       - Lustre's ko2iblnd does not care about coalescing -- it properly
         walks the returned sg list.
      
      This patch ups the value on Mellanox hardware to 1 GB, which matches
      reported firmware limits on mlx4.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7f9e5c48