- 15 7月, 2008 40 次提交
-
-
由 Roland Dreier 提交于
Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256, TX: 64->128) to act as a shock absorber for high traffic peaks. With the current settings, we have seen cases that there are many calls to netif_stop_queue(), which causes degradation in throughput. Also, larger receive buffer sizes help IPoIB in CM mode to avoid experiencing RNR NAK conditions due to insufficient receive buffers at the SRQ. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA mapping per send, so we don't need a mapping[] array. Define a new struct with a single u64 mapping member and use it for the CM tx_ring. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ralph Campbell 提交于
The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor ID field should be the IEEE OUI assigned to the vendor. The ipath driver was returning the PCI vendor ID instead. This will affect applications which call ibv_query_device(). The old value was 0x001fc1 or 0x001077, the new value is 0x001175. The vendor ID doesn't appear to be exported via /sys so that should reduce possible compatibility issues. I'm only aware of Open MPI as a major application which depends on this change, and they have made necessary adjustments. Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just use the core API in the one place that does a multicast group detach. add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105) function old new delta ipoib_mcast_leave 357 319 -38 ipoib_mcast_detach 67 - -67 Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The current code will set the Q_Key for any join of a non-sendonly multicast group. The operation involves a modify QP operation, which is fairly heavyweight, and is only really required after the join of the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach() to control when the Q_Key is set. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b ("IPoIB: Don't drop multicast sends when they can be queued"), so remove it. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL, so just remove that parameter and always ring the doorbell. Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NFaisal Latif <flatif@neteffect.com>
-
由 Jon Mason 提交于
cxgb3 does not currently report the page size capabilities, and incorrectly reports them internally. This version changes the bit-shifting to a static value (per Steve's request). Signed-off-by: NJon Mason <jon@opengridcomputing.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
The iw_nes driver repeats the logic if (atomic_dec_and_test(&cqp_request->refcount)) { if (cqp_request->dynamic) { kfree(cqp_request); } else { spin_lock_irqsave(&nesdev->cqp.lock, flags); list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); spin_unlock_irqrestore(&nesdev->cqp.lock, flags); } } over and over. Wrap this up in functions nes_free_cqp_request() and nes_put_cqp_request() to simplify such code. In addition to making the source smaller and more readable, this shrinks the compiled code quite a bit: add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528) function old new delta nes_free_cqp_request - 147 +147 nes_put_cqp_request - 17 +17 nes_modify_qp 2316 2293 -23 nes_hw_modify_qp 737 657 -80 nes_dereg_mr 945 860 -85 flush_wqes 501 416 -85 nes_manage_apbvt 648 560 -88 nes_reg_mr 1117 1026 -91 nes_cqp_ce_handler 927 769 -158 nes_alloc_mw 1052 884 -168 nes_create_qp 5314 5141 -173 nes_alloc_fmr 2212 2035 -177 nes_destroy_cq 1097 918 -179 nes_create_cq 2787 2598 -189 nes_dealloc_mw 762 566 -196 Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NFaisal Latif <flatif@neteffect.com>
-
由 Moni Shoua 提交于
The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: NMoni Levy <monil@voltaire.com> Signed-off-by: NMoni Shoua <monis@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
This gives ehca an autogenerated modalias and therefore enables automatic loading. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Vladimir Sokolovsky 提交于
Add "ipoib_use_lro" module parameter to enable LRO and an "ipoib_lro_max_aggr" module parameter to set the max number of packets to be aggregated. Make LRO controllable and LRO statistics accessible through ethtool. Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ron Livne 提交于
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if supported by the underlying device. This creates an improvement of up to 39% in bandwidth when sending multicast packets with IPoIB, and an improvment of 12% in cpu usage. Signed-off-by: NRon Livne <ronli@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ron Livne 提交于
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK flag by using the per-multicast group loopback blocking feature of mlx4 hardware. Signed-off-by: NRon Livne <ronli@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low level driver to obtain the protocol mib from the rnic hardware. - Add new iw_cxgb3 provider method to get the MIB from the low level driver. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
This patch adds a sysfs attribute group called "proto_stats" under /sys/class/infiniband/$device/ and populates this group with protocol statistics if they exist for a given device. Currently, only iWARP stats are defined, but the code is designed to allow InfiniBand protocol stats if they become available. These stats are per-device and more importantly -not- per port. Details: - Add union rdma_protocol_stats in ib_verbs.h. This union allows defining transport-specific stats. Currently only iwarp stats are defined. - Add struct iw_protocol_stats to define the current set of iwarp protocol stats. - Add new ib_device method called get_proto_stats() to return protocol statistics. - Add logic in core/sysfs.c to create iwarp protocol stats attributes if the device is an RNIC and has a get_proto_stats() method. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NSean Hefty <sean.hefty@intel.com>
-
由 Roland Dreier 提交于
The members struct iwch_rnic_attributes.vendor_id and .vendor_part_id are write-only, so we might as well get rid of them. Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NSteve Wise <swise@opengridcomputing.com>
-
由 Steve Wise 提交于
- set fw_ver - set hw_ver - set max_qp_wr to something reasonable - set max_cqe to something reasonable Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Stefan Roscher 提交于
During corner case testing, we noticed that some versions of ehca do not properly transition to interrupt done in special load situations. This can be resolved by periodically triggering EOI through H_EOI, if EQEs are pending. Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ralph Campbell 提交于
I was reviewing the QP state transition diagram in the IB 1.2.1 spec and the code for qp_state_table[], and noticed that the code allows a QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes for figure 124 (pg 457) specifically says that this transition isn't allowed. This is a clarification from earlier versions of the IB spec, which were ambiguous in this area and suggested that the RESET to ERR transition was allowed. Fix up the qp_state_table[] to make RESET->ERR not allowed. Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so process MADs of this class through the MAD_IFC firmware command. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
ConnectX returns the max message size it supports through the QUERY_DEV_CAP firmware command. When modifying a QP to RTR, the max message size for the QP must be specified. This value must not exceed the value declared through QUERY_DEV_CAP. The current code ignores the max allowed size and unconditionally sets the value to 2^31. This patch sets all QPs to the max value allowed as returned from firmware. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
- set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it. - set max_fast_reg_page_list_len device attribute. - add iwch_alloc_fast_reg_mr function. - add iwch_alloc_fastreg_pbl - add iwch_free_fastreg_pbl - adjust the WQ depth for kernel mode work queues to account for fastreg possibly taking 2 WR slots. - add fastreg_mr work request support. - add local_inv work request support. - add send_with_inv and send_with_se_inv work request support. - removed useless duplicate enums/defines for TPT/MW/MR stuff. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application *thinks* it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: NEli Cohen <eli@mellanox.co.il>
-
由 Roland Dreier 提交于
They don't get updated by git and so they're worse than useless. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The idea is that for QPs with fixed size work requests (eg selective signaling QPs), before stamping the WQE, we read the value of the DS field, which gives the effective size of the descriptor as used in the previous post. Then we stamp only that area, since the rest of the descriptor is already stamped. When initializing the send queue buffer, make sure the DS field is initialized to the max descriptor size so that the subsequent stamping will be done on the entire descriptor area. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Moni Shoua 提交于
This patch solves a race that occurs after an event occurs that causes the SA query module to flush its SM address handle (AH). When SM AH becomes invalid and needs an update it is handled by the global workqueue. On the other hand this event is also handled in the IPoIB driver by queuing work in the ipoib_workqueue that does multicast joins. Although queuing is in the right order, it is done to 2 different workqueues and so there is no guarantee that the first to be queued is the first to be executed. This causes a problem because IPoIB may end up sending an request to the old SM, which will take a long time to time out (since the old SM is gone); this leads to a much longer than necessary interruption in multicast traffer. The patch sets the SA query module's SM AH to NULL when the event occurs, and until update_sm_ah() is done, any request that needs sm_ah fails with -EAGAIN return status. For consumers, the patch doesn't make things worse. Before the patch, MADs are sent to the wrong SM so the request gets lost. Consumers can be improved if they examine the return code and respond to EAGAIN properly but even without an improvement the situation is not getting worse. Signed-off-by: NMoni Levy <monil@voltaire.com> Signed-off-by: NMoni Shoua <monis@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-