- 01 10月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 26 9月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194> Reported-by: NRabah Salem <ravah@mellanox.com> Debugged-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 17 9月, 2008 1 次提交
-
-
由 Yossi Etigin 提交于
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: NYossi Etigin <yosefe@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 24 8月, 2008 1 次提交
-
-
由 Adrian Bunk 提交于
This patch lets the files using linux/version.h match the files that #include it. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 8月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 09 8月, 2008 1 次提交
-
-
由 David J. Wilder 提交于
There are users that are running UDP applications that require a large receive queue size in order to get good performance. To prevent allocation failures for rx_rings when using non-SRQ mode and large recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to alocate rx_rings. Signed-off-by: NDavid Wilder <dwilder@us.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 7月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
wr->sg_list should be set to the sge pointer passed in, not priv->cm.rx_sge. Reported-by: NHoang-Nam Nguyen <HNGUYEN@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 7月, 2008 2 次提交
-
-
由 Roland Dreier 提交于
The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs," which no longer exists. Correct this to talk about the files under debugfs that are really created. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Connected mode is now tested and used by lots of people. No need to hide it under CONFIG_EXPERIMENTAL. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 23 7月, 2008 2 次提交
-
-
由 Or Gerlitz 提交于
Print the return code of ib_sa_path_rec_get() if it fails to help debug errors. Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Or Gerlitz 提交于
Enhance iser to act upon notification on network stack changes that make its RDMA connection unaligned with the link used by the stack for the <src,dst> IPs used to establish the connection. When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the connection, assuming that the user space iscsid daemon will reconnect, and the new connection will be aligned with the IP stack. Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 15 7月, 2008 17 次提交
-
-
由 David S. Miller 提交于
Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256, TX: 64->128) to act as a shock absorber for high traffic peaks. With the current settings, we have seen cases that there are many calls to netif_stop_queue(), which causes degradation in throughput. Also, larger receive buffer sizes help IPoIB in CM mode to avoid experiencing RNR NAK conditions due to insufficient receive buffers at the SRQ. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA mapping per send, so we don't need a mapping[] array. Define a new struct with a single u64 mapping member and use it for the CM tx_ring. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just use the core API in the one place that does a multicast group detach. add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105) function old new delta ipoib_mcast_leave 357 319 -38 ipoib_mcast_detach 67 - -67 Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The current code will set the Q_Key for any join of a non-sendonly multicast group. The operation involves a modify QP operation, which is fairly heavyweight, and is only really required after the join of the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach() to control when the Q_Key is set. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b ("IPoIB: Don't drop multicast sends when they can be queued"), so remove it. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Moni Shoua 提交于
The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: NMoni Levy <monil@voltaire.com> Signed-off-by: NMoni Shoua <monis@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Vladimir Sokolovsky 提交于
Add "ipoib_use_lro" module parameter to enable LRO and an "ipoib_lro_max_aggr" module parameter to set the max number of packets to be aggregated. Make LRO controllable and LRO statistics accessible through ethtool. Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ron Livne 提交于
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if supported by the underlying device. This creates an improvement of up to 39% in bandwidth when sending multicast packets with IPoIB, and an improvment of 12% in cpu usage. Signed-off-by: NRon Livne <ronli@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application *thinks* it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: NEli Cohen <eli@mellanox.co.il>
-
由 Roland Dreier 提交于
They don't get updated by git and so they're worse than useless. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
The SRP initiator is currently using ib_find_cached_pkey() and ib_get_cached_gid() in situations where the uncached ib_find_pkey() and ib_query_gid() functions serve just as well: sleeping is allowed and performance is not an issue. Since we want to eliminate the cached operations in the long term, convert SRP to use the uncached variants. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 12 7月, 2008 12 次提交
-
-
由 Mike Christie 提交于
This patch fixes two bugs that are related. 1. Old tools did not set can_queue/cmds_max. This patch modifies libiscsi so that when we add the host we catch this and set it to the default. 2. iscsi_tcp thought that the scsi command that was passed to the eh functions needed a iscsi_cmd_task allocated for it. It only needed a mgmt task, and now it does not matter since it all comes from the same pool and libiscsi handles this for the drivers. ib_iser had copied iscsi_tcp's code and set can_queue to its max - 1 to handle this. So this patch removes the max -1, and just sets it to the max. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
The recv lock was defined so the iscsi layer could block the recv path from processing IO during recovery. It turns out iser just set a lock to that pointer which was pointless. We now disconnect the transport connection before doing recovery so we do not need the recv lock. For iscsi_tcp we still stop the recv path incase older tools are being used. This patch also has iscsi_itt_to_ctask user grab the session lock and has the caller access the task with the lock or get a ref to it in case the target is broken and sends a tmf success response then sends data or a response for the command that was supposed to be affected bty the tmf. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
This adds two new attrs used for creating initiator ports and binding sessions to hardware. The session level initiatorname: Since bnx2i does a scsi_host per host device, we need to add the iface initiator port settings on the session, so we can create multiple initiator ports (each with different inames) per device/scsi_host. The current iname reflects that qla4xxx can have one iname per hba, and we are allocating a host per session for software. The iname on the host will remain so we can export and set the hba level qla4xxx setting. The ifacename attr: To bind a session to a some peice of hardware in userspace we maintain some mappings, but during boot or iscsid restart (iscsid contains the user space part of the driver) we need to be able to figure out which of those host mappings abstractions maps to certain sessions. This patch adds a ifacename attr, which userspace can set to id the host side of the endpoint across pivot_roots and iscsid restarts. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
This hooks iser into the iscsi endpoint code. Previously it handled the lookup and allocation. This has been made generic so bnx2i and iser can share it. It also allows us to pass iser the leading conn's ep, so we know the ib_deivce being used and can set it as the scsi_host's parent. And that allows scsi-ml to set the dma_mask based on those values. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
Currently we duplicate the list of sessions, because we were using the test for if a session was on the host list to indicate if the session was bound or unbound. We can instead use the target_id and fix up the class so that drivers like bnx2i do not have to manage the target id space. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
This handles the iscsi_cmd_task rename and renames the iser cmd task to iser task. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
Convert ib_iser to support merged tasks. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
Currently to get a ctask from the session cmd array, you have to know to use the itt modifier. To make this easier on LLDs and so in the future we can easilly kill the session array and use the host shared map instead, this patch adds a nice wrapper to strip the itt into a session->cmds index and return a ctask. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
After the stop_conn callback has returned the LLD should not touch the scsi cmds. iscsi_tcp and libiscsi use the conn->recv_lock and suspend_rx field to halt recv path processing, but iser does not have any protection. This patch modifies iser so that userspace can just call the ep_disconnect callback, which will halt all recv IO, before calling the stop_conn callback so we do not have to worry about the conn->recv_lock and suspend rx field. iser just needs to stop the send side from accessing the ib conn. Fixup to handle when the ep poll fails and ep disconnect is called from Erez. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
This removes the session and conn data_size fields from the iscsi_transport. Just pass in the value like with host allocation. This patch also makes it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
This finishes the host/session unbinding, by adding some helpers to add and remove hosts and the session they manage. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Mike Christie 提交于
bnx2i allocates a host per netdevice but will use libiscsi, so this unbinds the session from the host in that code. This will also be useful for the iser parent device dma settings fixes. Signed-off-by: NMike Christie <michaelc@cs.wisc.edu> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-