- 13 1月, 2017 4 次提交
-
-
由 Feras Daoud 提交于
Add a detailed return code to dev_queue_xmit function when calling to requeue packet via __skb_dequeue. Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Feras Daoud 提交于
When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: <stable@vger.kernel.org> # v3.6+ Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Feras Daoud 提交于
When changing the connection mode, the ipoib_set_mode function did not check if the previous connection mode equals to the new one. This commit adds the required check and return 0 if the new mode equals to the previous one. Fixes: 839fcaba ("IPoIB: Connected mode experimental support") Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Reviewed-by: NAlex Vesker <valex@mellanox.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Feras Daoud 提交于
In datagram mode, the IB UD (Unreliable Datagram) transport is used so the MTU of the interface is equal to the IB L2 MTU minus the IPoIB encapsulation header. Any request to change the MTU value above the maximum range will change the MTU to the max allowed, but will not show any warning message. An ipoib_warn is issued in such cases, letting the user know that even though the value is legal, it can't be currently applied. Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NNoa Osherovich <noaos@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 04 12月, 2016 1 次提交
-
-
由 Leon Romanovsky 提交于
The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 21 10月, 2016 1 次提交
-
-
由 Jarod Wilson 提交于
firewire-net: - set min/max_mtu - remove fwnet_change_mtu nes: - set max_mtu - clean up nes_netdev_change_mtu xpnet: - set min/max_mtu - remove xpnet_dev_change_mtu hippi: - set min/max_mtu - remove hippi_change_mtu batman-adv: - set max_mtu - remove batadv_interface_change_mtu - initialization is a little async, not 100% certain that max_mtu is set in the optimal place, don't have hardware to test with rionet: - set min/max_mtu - remove rionet_change_mtu slip: - set min/max_mtu - streamline sl_change_mtu um/net_kern: - remove pointless ndo_change_mtu hsi/clients/ssi_protocol: - use core MTU range checking - remove now redundant ssip_pn_set_mtu ipoib: - set a default max MTU value - Note: ipoib's actual max MTU can vary, depending on if the device is in connected mode or not, so we'll just set the max_mtu value to the max possible, and let the ndo_change_mtu function continue to validate any new MTU change requests with checks for CM or not. Note that ipoib has no min_mtu set, and thus, the network core's mtu > 0 check is the only lower bounds here. mptlan: - use net core MTU range checking - remove now redundant mpt_lan_change_mtu fddi: - min_mtu = 21, max_mtu = 4470 - remove now redundant fddi_change_mtu (including export) fjes: - min_mtu = 8192, max_mtu = 65536 - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to get past the core net MTU range checks so fjes_change_mtu can validate a new MTU against what it supports (see fjes_support_mtu in fjes_hw.c) hsr: - min_mtu = 0 (calls ether_setup, max_mtu is 1500) f_phonet: - min_mtu = 6, max_mtu = 65541 u_ether: - min_mtu = 14, max_mtu = 15412 phonet/pep-gprs: - min_mtu = 576, max_mtu = 65530 - remove redundant gprs_set_mtu CC: netdev@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: Stefan Richter <stefanr@s5r6.in-berlin.de> CC: Faisal Latif <faisal.latif@intel.com> CC: linux-rdma@vger.kernel.org CC: Cliff Whickman <cpw@sgi.com> CC: Robin Holt <robinmholt@gmail.com> CC: Jes Sorensen <jes@trained-monkey.org> CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sathya Prakash <sathya.prakash@broadcom.com> CC: Chaitra P B <chaitra.basappa@broadcom.com> CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> CC: MPT-FusionLinux.pdl@broadcom.com CC: Sebastian Reichel <sre@kernel.org> CC: Felipe Balbi <balbi@kernel.org> CC: Arvid Brodin <arvid.brodin@alten.se> CC: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: NJarod Wilson <jarod@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 10月, 2016 1 次提交
-
-
由 David Ahern 提交于
Convert ipoib_get_net_dev_match_addr to the new upper device walk API. This is just a code conversion; no functional change is intended. v2 - removed typecast of data Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 10月, 2016 1 次提交
-
-
由 Paolo Abeni 提交于
After the commit 9207f9d4 ("net: preserve IP control block during GSO segmentation"), the GSO CB and the IPoIB CB conflict. That destroy the IPoIB address information cached there, causing a severe performance regression, as better described here: http://marc.info/?l=linux-kernel&m=146787279825501&w=2 This change moves the data cached by the IPoIB driver from the skb control lock into the IPoIB hard header, as done before the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses"). In order to avoid GRO issue, on packet reception, the IPoIB driver stash into the skb a dummy pseudo header, so that the received packets have actually a hard header matching the declared length. To avoid changing the connected mode maximum mtu, the allocated head buffer size is increased by the pseudo header length. After this commit, IPoIB performances are back to pre-regression value. v2 -> v3: rebased v1 -> v2: avoid changing the max mtu, increasing the head buf size Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2016 1 次提交
-
-
由 Bhaktipriya Shridhar 提交于
alloc_ordered_workqueue() replaces deprecated create_singlethread_workqueue(). The workqueue "ipoib_workqueue" that is used for all flush operations for the device. WQ_MEM_RECLAIM has been set since the flush operations may need to complete in order for other network functions to continue, and the memory reclaim operation might need the network functioning in order to make progress. Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 03 9月, 2016 1 次提交
-
-
由 Erez Shitrit 提交于
When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba ('IPoIB: Connected mode experimental support') Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 04 8月, 2016 1 次提交
-
-
由 Yuval Shaia 提交于
Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit ec5f0615 ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 07 6月, 2016 3 次提交
-
-
由 Erez Shitrit 提交于
ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: b63b70d8 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67 ("IB/IPoIB: Allow setting the device address") Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Erez Shitrit 提交于
In ipoib_remove_one the driver holds the rtnl_lock and tries to do some operation like dev_change_flags or unregister_netdev, while sysfs callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to free its sysfs directory, and we will get a->b, b->a deadlock. Trace like the following: schedule+0x37/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0xb5/0x120 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x17c/0x320 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib] delete_child+0x54/0x80 [ib_ipoib] dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 mutex_lock+0x16/0x40 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x16/0x75 And schedule+0x37/0x80 __kernfs_remove+0x1a8/0x260 ? wake_atomic_t_function+0x60/0x60 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x50/0x80 kobject_del+0x18/0x50 device_del+0x19f/0x260 netdev_unregister_kobject+0x6a/0x80 rollback_registered_many+0x1fd/0x340 rollback_registered+0x3c/0x70 unregister_netdevice_queue+0x55/0xc0 unregister_netdev+0x20/0x30 ipoib_remove_one+0x114/0x1b0 [ib_ipoib] ib_unregister_client+0x4a/0x170 [ib_core] ? find_module_all+0x71/0xa0 ipoib_cleanup_module+0x10/0x94 [ib_ipoib] SyS_delete_module+0x1b5/0x210 entry_SYSCALL_64_fastpath+0x16/0x75 The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to get out from the sysfs function. Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Fixes: 9baa0b03 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 26 5月, 2016 2 次提交
-
-
由 Mark Bloch 提交于
In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Erez Shitrit 提交于
Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 05 5月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
a trans_start struct member exists twice: - in struct net_device (legacy) - in struct netdev_queue Instead of open-coding dev->trans_start usage to obtain the current trans_start value, use dev_trans_start() instead. This is not exactly the same, as dev_trans_start also considers the trans_start values of the netdev queues owned by the device and provides the most recent one. For legacy devices this doesn't matter as dev_trans_start can cope with netdev trans_start values of 0 (they are ignored). This is a prerequisite to eventual removal of dev->trans_start. Cc: linux-rdma@vger.kernel.org Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2016 1 次提交
-
-
由 Eli Cohen 提交于
Add ndo operations to the network driver that enables configuring the following operations: ipoib_set_vf_link_state - configure the VF link policy ipoib_get_vf_config - get link state configuration ipoib_set_vf_guid - set a VF port or node GUID ipoib_get_vf_stats - get statistics of a VF Signed-off-by: NEli Cohen <eli@mellanox.com> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 20 1月, 2016 1 次提交
-
-
由 Erez Shitrit 提交于
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the parameter mcast->dev. That mcast is a temporary (used as an iterator) variable that may be uninitialized. There is no need to send the variable dev to the function, as each mcast has its dev as a member in the mcast struct. This causes the next panic: RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib] RSP: 0018: EFLAGS: 00010246 RAX: f0201 RBX: 24e00 RCX: 00000 .... .... Stack: Call Trace: ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib] ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib] process_one_work+0x164/0x470 worker_thread+0x11d/0x420 ... Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code') Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Reported-by: NDoron Tsur <doront@mellanox.com> Reviewed-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 24 12月, 2015 2 次提交
-
-
由 Christoph Lameter 提交于
Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christoph Lameter 提交于
Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 12月, 2015 1 次提交
-
-
由 Or Gerlitz 提交于
Instead, use the cached copy of the attributes present on the device. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 22 10月, 2015 1 次提交
-
-
由 Matan Barak 提交于
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: NMatan Barak <matanb@mellanox.com> Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 14 10月, 2015 1 次提交
-
-
由 Christoph Lameter 提交于
When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e0 (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: NChristoph Lameter <cl@linux.com> Tested-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 08 10月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: NHaggai Eran <haggaie@mellanox.com> Tested-by: NSagi Grimberg <sagig@mellanox.com> Tested-by: NSteve Wise <swise@opengridcomputing.com>
-
- 26 9月, 2015 1 次提交
-
-
由 Christoph Lameter 提交于
On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 31 8月, 2015 2 次提交
-
-
由 Guy Shapiro 提交于
Implement the get_net_device_by_port_pkey_ip callback that returns network device to ib_core according to connection parameters. Check the ipoib device and iterate over all child devices to look for a match. For each IPoIB device we iterate through all upper devices when searching for a matching IP, in order to support bonding. Signed-off-by: NGuy Shapiro <guysh@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NYotam Kenneth <yotamke@mellanox.com> Signed-off-by: NShachar Raindel <raindel@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Haggai Eran 提交于
An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 15 7月, 2015 4 次提交
-
-
由 Erez Shitrit 提交于
When switching between modes (datagram / connected) change the MTU accordingly. datagram mode up to 4K, connected mode up to (64K - 0x10). Signed-off-by: NELi Cohen <eli@mellanox.com> Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Yuval Shaia 提交于
By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better performance. This MTU plus overhead puts the memory allocation for IP based packets at 32 4k pages (order 5), which have to be contiguous. When the system memory under pressure, it was observed that allocating 128k contiguous physical memory is difficult and causes serious errors (such as system becomes unusable). This enhancement resolve the issue by removing the physically contiguous memory requirement using Scatter/Gather feature that exists in Linux stack. With this fix Scatter-Gather will be supported also in connected mode. This change reverts some of the change made in commit e112373f ("IPoIB/cm: Reduce connected mode TX object size"). The ability to use SG in IPoIB CM is possible because the coupling between NETIF_F_SG and NETIF_F_CSUM was removed in commit ec5f0615 ("net: Kill link between CSUM and SG features.") Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Acked-by: NChristian Marie <christian@ponies.io> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Amir Vadai 提交于
Error values of ib_query_port() and ib_query_device() weren't propagated correctly. Because of that, ipoib_add_port() could return NULL value, which escaped the IS_ERR() check in ipoib_add_one() and we crashed. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Hal Rosenstock 提交于
Persuant to Liran's comments on node_type on linux-rdma mailing list: In an effort to reform the RDMA core and ULPs to minimize use of node_type in struct ib_device, an additional bit is added to struct ib_device for is_switch (IB switch). This is needed to be initialized by any IB switch device driver. This is a NEW requirement on such device drivers which are all "out of tree". In addition, an ib_switch helper was added to ib_verbs.h based on the is_switch device bit rather than node_type (although those should be consistent). The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs) as well as (IPoIB and SRP) ULPs are updated where appropriate to use this new helper. In some cases, the helper is now used under the covers of using rdma_[start end]_port rather than the open coding previously used. Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reviewed-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Tested-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NHal Rosenstock <hal@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 02 6月, 2015 1 次提交
-
-
由 Bart Van Assche 提交于
Avoid that sparse complains about ipoib_neigh_hash_init(). This patch does not change any functionality. See also patch "IPoIB: Fix memory leak in the neigh table deletion flow" (commit ID 66172c09). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 19 5月, 2015 1 次提交
-
-
由 Michael Wang 提交于
Use raw management helpers to reform IB-ulp ipoib. Signed-off-by: NMichael Wang <yun.wang@profitbricks.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Tested-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 18 4月, 2015 1 次提交
-
-
由 Erez Shitrit 提交于
Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8 ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: NHonggang Li <honli@redhat.com> Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NHonggang Li <honli@redhat.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+ Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2015 4 次提交
-
-
由 Erez Shitrit 提交于
Whenever there is no path->ah to the destination, keep only defined number of skb's. Otherwise there are cases that the driver can keep infinite list of skb's. For example, when one device want to send unicast arp to the destination, and from some reason the SM doesn't respond, the driver currently keeps all the skb's. If that unicast arp traffic stopped, all these skb's are kept by the path object till the interface is down. Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 03 4月, 2015 1 次提交
-
-
由 Nicolas Dichtel 提交于
Don't use dev->iflink anymore. CC: Roland Dreier <roland@kernel.org> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-