- 26 1月, 2008 2 次提交
-
-
由 Pradeep Satyanarayana 提交于
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared receive queues). The current IPoIB connected mode support only works on devices that support SRQs. Fix this by adding support for using the receive queue of each connected mode receive QP. The disadvantage of this compared to using an SRQ is that it means a full queue of receives must be posted for each remote connected mode peer, which means that total memory usage is potentially much higher than when using SRQs. To manage this, add a new module parameter "max_nonsrq_conn_qp" that limits the number of connections allowed per interface. The rest of the changes are fairly straightforward: we use a table of struct ipoib_cm_rx to hold all the active connections, and put the table index of the connection in the high bits of receive WR IDs. This is needed because we cannot rely on the struct ib_wc.qp field for non-SRQ receive completions. Most of the rest of the changes just test whether or not an SRQ is available, and post receives or find received packets in the right place depending on the answer. Cleaning up dead connections actually becomes simpler, because we do not have to do the "last WQE reached" dance that is required to destroy QPs attached to an SRQ. We just move the QP to the error state and wait for all pending receives to be flushed. Signed-off-by: NPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> [ Completely rewritten and split up, based on Pradeep's work. Several bugs fixed and no doubt several bugs introduced. - Roland ] Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 28 11月, 2007 1 次提交
-
-
由 Jack Morgenstein 提交于
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes the mcasts (clearing priv->broadcast) and clearing the path record cache. If ipoib_start_xmit() is then invoked (before the broadcast group is rejoined), a kernel oops results from attempting to access priv->broadcast, which is still unset. Returning NULL from path_rec_create() if priv->broadcast is NULL is a harmless way of bypassing the problem -- the offending packet is simply discarded "without prejudice." Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 20 10月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Use the same CQ for CM send completions as for all other IPoIB completions. This means all completions are processed via the same NAPI polling routine. This should help reduce the number of interrupts for bi-directional traffic (such as TCP) and fixes "driver is hogging interrupts" errors reported for IPoIB send side, e.g. <https://bugs.openfabrics.org/show_bug.cgi?id=508> To do this, keep a per-interface counter of outstanding send WRs, and stop the interface when this counter reaches the send queue size to avoid CQ overruns. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 16 10月, 2007 2 次提交
-
-
由 Moni Shoua 提交于
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: NRoland Dreier <rdreier@cisco.com> Signed-off-by: NJeff Garzik <jeff@garzik.org>
-
由 Moni Shoua 提交于
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n->dev->flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: NRoland Dreier <rdreier@cisco.com> Signed-off-by: NJeff Garzik <jeff@garzik.org>
-
- 11 10月, 2007 6 次提交
-
-
由 Roland Dreier 提交于
The conversion to use netdevice internal stats left an unused variable in ipoib_neigh_free(), since there's no longer any reason to get netdev_priv() in order to increment dropped packets. Delete the unused priv variable. Signed-off-by: NRoland Dreier <rolandd@cisco.com> Signed-off-by: NJeff Garzik <jeff@garzik.org>
-
由 Roland Dreier 提交于
Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: NRoland Dreier <rolandd@cisco.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NJeff Garzik <jeff@garzik.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device *dev, int *budget) to int foo_poll(struct napi_struct *napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, *budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
The kernel IB stack allows (through the RDMA CM) userspace applications to join and use multicast groups from the IPoIB MGID range. This allows multicast traffic to be handled directly from userspace QPs, without going through the kernel stack, which gives better performance for some applications. However, to fully interoperate with IP multicast, such userspace applications need to participate in IGMP reports and queries, or else routers may not forward the multicast traffic to the system where the application is running. The simplest way to do this is to share the kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to join multicast groups that are being handled directly in userspace. However, in such cases, the actual multicast traffic should not also be handled by the IPoIB interface, because that would burn resources handling multicast packets that will just be discarded in the kernel. To handle this, this patch adds lookup on the database used for IB multicast group reference counting when IPoIB is joining multicast groups, and if a multicast group is already handled by user space, then the IPoIB kernel driver ignores the group. This is controlled by a per-interface policy flag. When the flag is set, IPoIB will not join and attach its QP to a multicast group which already has an entry in the database; when the flag is cleared, IPoIB will behave as before this change. For each IPoIB interface, the /sys/class/net/$intf/umcast attribute controls the policy flag. The default value is off/0. Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 10 10月, 2007 2 次提交
-
-
由 Sean Hefty 提交于
To support QoS within and between subnets, modify IPoIB to request specific Traffic Class values with path record queries, using the value associated with the IPoIB broadcast group. Signed-off-by: NSean Hefty <sean.hefty@intel.com> [ See some comments I made on this at v1 and v2 of the posts <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html> <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ] Reviewed-by: NOr Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
Clean up properly if ib_query_pkey() or ib_query_gid() fail. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 19 5月, 2007 1 次提交
-
-
由 Yosef Etigin 提交于
SM reconfiguration or failover possibly causes a shuffling of the values in the P_Key table. Right now, IPoIB only queries for the P_Key index once when it creates the device QP, and hence there are problems if the index of a P_Key value changes. Fix this by using the PKEY_CHANGE event to trigger a recheck of the P_Key index. Signed-off-by: NYosef Etigin <yosefe@voltaire.com> Acked-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 07 5月, 2007 1 次提交
-
-
由 Roland Dreier 提交于
Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 4月, 2007 1 次提交
-
-
由 Sean Hefty 提交于
To support destinations that are not on the local IB subnet, IPoIB should include the GRH information when constructing an address handle. Using the existing ib_init_ah_from_path() call will do this for us. Signed-off-by: NSean Hefty <sean.hefty@intel.com>
-
- 26 3月, 2007 1 次提交
-
-
由 Alexey Kuznetsov 提交于
->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
The connected mode code added the possibility that an neigh struct gets freed in the list_for_each_entry() loop in path_rec_completion(), which causes a use-after-free. Fix this by changing to the _safe variant of the list walking macro. This was spotted by the Coverity checker (CID 1567). Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 27 2月, 2007 1 次提交
-
-
由 Roland Dreier 提交于
If path_rec_completion() is passed a non-NULL path record pointer along with an unsuccessful status value, the tracing code incorrectly prints the (invalid) DLID from the path record rather than the more interesting status code. The actual logic of the function correctly uses the path record only if the status indicates a successful lookup. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 11 2月, 2007 1 次提交
-
-
由 Michael S. Tsirkin 提交于
The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 08 2月, 2007 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
This lets the network core have the ability to handle suspend/resume issues, if it wants to. Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm driver fixes. Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 13 12月, 2006 1 次提交
-
-
由 Roland Dreier 提交于
Move the initialization of ipoib_neigh's skb_queue into ipoib_neigh_alloc(), since commit 2745b5b7 ("IPoIB: Fix skb leak when freeing neighbour") will make iterate over the skb_queue to free any packets left over when freeing the ipoib_neigh structure. This fixes a crash when freeing ipoib_neigh structures allocated in ipoib_mcast_send(), which otherwise don't have their skb_queue initialized. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 11月, 2006 1 次提交
-
-
由 Michael S. Tsirkin 提交于
ipoib_neigh_free() is sometimes called while neighbour is still alive, so it might still have queued skbs. Fix skb leak in this case. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 22 11月, 2006 1 次提交
-
-
由 David Howells 提交于
Fix up for make allyesconfig. Signed-Off-By: NDavid Howells <dhowells@redhat.com>
-
- 17 11月, 2006 1 次提交
-
-
由 Michael S. Tsirkin 提交于
IPoIB assumes that high (reserved) octet in the hardware address is 0, and copies it into the QPN. This violates RFC 4391 (which requires that the high 8 bits are ignored on receive), and will result in an invalid QPN being used when interoperating with IPoIB connected mode. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 23 9月, 2006 5 次提交
-
-
由 Eli Cohen 提交于
Signed-off-by: NEli Cohen <eli@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Dotan Barak 提交于
IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it. Signed-off-by: NDotan Barak <dotanb@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
Require users to register with SA module, to prevent the sa_query module text from going away while an SA query callback is still running. Update all in-tree users for the new interface. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Tom Tucker 提交于
Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael S. Tsirkin 提交于
Prevent flush task from freeing the ipoib_neigh pointer, while ipoib_start_xmit() is accessing the ipoib_neigh through the pointer it has loaded from the skb's hardware address. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 7月, 2006 1 次提交
-
-
由 Michael S. Tsirkin 提交于
The neighbour ha field may get updated without destroying the neighbour. In this case, the ha field gets out of sync with the address handle stored in ipoib_neigh->ah, with the result that the ah field would point to an incorrect path, resulting in all packets being lost. Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 18 6月, 2006 1 次提交
-
-
由 Jack Morgenstein 提交于
Fix misaligned access faults on ia64: never cast a misaligned neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer instead. The memcpy was being optimized to use full word accesses because the compiler thought that union ib_gid is always aligned. The cast in IPOIB_GID_ARG is safe, since it is fixed to access each byte separately. Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il> Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 11 4月, 2006 4 次提交
-
-
由 Roland Dreier 提交于
We know ipoib_flush_paths() is called from plain process context with interrupts enabled, since it does wait_for_completion(). So there's no need to use spin_lock_irqsave() -- spin_lock_irq() is fine. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
ib_sa_cancel_query() must be called with priv->lock held since a completion might arrive and set path->query to NULL. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Shirley Ma 提交于
Make IPoIB's send and receive queue sizes tunable via module parameters ("send_queue_size" and "recv_queue_size"). This allows the queue sizes to be enlarged to fix disastrously bad performance on some platforms and workloads, without bloating memory usage when large queues aren't needed. Signed-off-by: NShirley Ma <xma@us.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Jack Morgenstein 提交于
Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 05 4月, 2006 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Consolidate IPoIB's private neighbour data handling into ipoib_neigh_alloc() and ipoib_neigh_free(). This will make it easier to keep track of the neighbour structures that IPoIB is handling, and is a nice cleanup of the code: add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78) function old new delta ipoib_neigh_alloc - 61 +61 ipoib_neigh_free - 36 +36 ipoib_mcast_join_finish 1288 1291 +3 path_rec_completion 575 573 -2 ipoib_mcast_join_task 664 660 -4 ipoib_neigh_destructor 101 92 -9 ipoib_neigh_setup_dev 14 3 -11 ipoib_neigh_setup 17 - -17 path_free 238 215 -23 ipoib_mcast_free 329 306 -23 ipoib_mcast_send 718 684 -34 neigh_add_path 705 650 -55 Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 3月, 2006 1 次提交
-
-
由 Roland Dreier 提交于
ipoib_hard_header() needs to handle the case that daddr is NULL. This can happen when packets are injected via a raw socket, and IPoIB shouldn't oops in this case. Reported by Anton Blanchard <anton@samba.org> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 3月, 2006 1 次提交
-
-
由 Leonid Arsh 提交于
This patch causes the network interface to respond to P_Key change events correctly. As a result, you'll see a child interface in the "RUNNING" state (netif_carrier_on()) only when the corresponding P_Key is configured by the SM. When SM removes a P_Key, the "RUNNING" state will be disabled for the corresponding network interface. To implement this, I added IB_EVENT_PKEY_CHANGE event handling. To prevent flushing the device before the device is open by the "delay open" mechanism, I added an additional device flag called IPOIB_FLAG_INITIALIZED. This also prevents the child network interface from trying to join to multicast groups until the PKEY is configured. We used to get error messages like: ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff in this case. To fix this, I just check IPOIB_FLAG_OPER_UP flag in ipoib_set_mcast_list(). Signed-off-by: NLeonid Arsh <leonida@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-