提交 · 50174a7f2c24d13cdeec435ee1ba70b1e0b1318f · openeuler / Kernel

20 1月, 2016 1 次提交

IB/IPoIB: Fix kernel panic on multicast flow · 50be28de

由 Erez Shitrit 提交于 1月 07, 2016

ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.

This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS: 00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
	process_one_work+0x164/0x470
	worker_thread+0x11d/0x420
	...

Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reported-by: NDoron Tsur <doront@mellanox.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50be28de

24 12月, 2015 2 次提交

IB/IPoIB: Move multicast specific code out of ipoib_main.c · 432c55ff

由 Christoph Lameter 提交于 12月 21, 2015

Code cleanup to move multicast specific code that checks for
a sendonly join to ipoib_multicast.c. This allows the removal
of the export of __ipoib_mcast_find().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

432c55ff

IB/IPoIB: factor out common multicast list removal code · 5a0e81f6

由 Christoph Lameter 提交于 12月 21, 2015

Code cleanup to remove multicast specific code from ipoib_main.c

The removal of a list of multicast groups occurs in three places.
Create a new function ipoib_mcast_remove_list(). Use this new
function in ipoib_main.c too.
That in turn allows the dropping of two functions that were
exported from ipoib_multicast.c for expiration of mc groups.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5a0e81f6

23 12月, 2015 1 次提交

IB/ulps: Avoid calling ib_query_device · 4a061b28

由 Or Gerlitz 提交于 12月 18, 2015

Instead, use the cached copy of the attributes present on the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4a061b28

22 10月, 2015 1 次提交

IB/core: Add netdev and gid attributes paramteres to cache · 55ee3ab2

由 Matan Barak 提交于 10月 15, 2015

Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55ee3ab2

14 10月, 2015 1 次提交

IB/ipoib: For sendonly join free the multicast group on leave · 0b5c9279

由 Christoph Lameter 提交于 10月 11, 2015

When we leave the multicast group on expiration of a neighbor we
do not free the mcast structure. This results in a memory leak
that causes ib_dealloc_pd to fail and print a WARN_ON message
and backtrace.

Fixes: bd99b2e0 (IB/ipoib: Expire sendonly multicast joins)
Signed-off-by: NChristoph Lameter <cl@linux.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b5c9279

08 10月, 2015 1 次提交

IB: split struct ib_send_wr · e622f2f4

由 Christoph Hellwig 提交于 10月 08, 2015

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: NHaggai Eran <haggaie@mellanox.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>

e622f2f4

26 9月, 2015 1 次提交

IB/ipoib: Expire sendonly multicast joins · bd99b2e0

由 Christoph Lameter 提交于 9月 24, 2015

On neighbor expiration, check to see if the neighbor was actually a
sendonly multicast join, and if so, leave the multicast group as we
expire the neighbor.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd99b2e0

31 8月, 2015 2 次提交

IB/ipoib: Return IPoIB devices matching connection parameters · ddde896e

由 Guy Shapiro 提交于 7月 30, 2015

Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.

For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.
Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYotam Kenneth <yotamke@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ddde896e

IB/core: lock client data with lists_rwsem · 7c1eb45a

由 Haggai Eran 提交于 7月 30, 2015

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7c1eb45a

15 7月, 2015 4 次提交

IB/ipoib: Set MTU to max allowed by mode when mode changes · edcd2a74

由 Erez Shitrit 提交于 6月 07, 2015

When switching between modes (datagram / connected) change the MTU
accordingly.
datagram mode up to 4K, connected mode up to (64K - 0x10).
Signed-off-by: NELi Cohen <eli@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

edcd2a74

IB/ipoib: Scatter-Gather support in connected mode · c4268778

由 Yuval Shaia 提交于 7月 12, 2015

By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
performance.
This MTU plus overhead puts the memory allocation for IP based packets at
32 4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).

This enhancement resolve the issue by removing the physically contiguous
memory requirement using Scatter/Gather feature that exists in Linux stack.

With this fix Scatter-Gather will be supported also in connected mode.

This change reverts some of the change made in commit e112373f
("IPoIB/cm: Reduce connected mode TX object size").

The ability to use SG in IPoIB CM is possible because the coupling
between NETIF_F_SG and NETIF_F_CSUM was removed in commit
ec5f0615 ("net: Kill link between CSUM and SG features.")
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NChristian Marie <christian@ponies.io>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4268778

IB/IPoIB: Fix bad error flow in ipoib_add_port() · 58e9cc90

由 Amir Vadai 提交于 7月 01, 2015

Error values of ib_query_port() and ib_query_device() weren't propagated
correctly. Because of that, ipoib_add_port() could return NULL value,
which escaped the IS_ERR() check in ipoib_add_one() and we crashed.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

58e9cc90

IB: Add rdma_cap_ib_switch helper and use where appropriate · 4139032b

由 Hal Rosenstock 提交于 6月 29, 2015

Persuant to Liran's comments on node_type on linux-rdma
mailing list:

In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed
to be initialized by any IB switch device driver. This is a
NEW requirement on such device drivers which are all
"out of tree".

In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).

The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper. In some cases,
the helper is now used under the covers of using
rdma_[start end]_port rather than the open coding
previously used.
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NHal Rosenstock <hal@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4139032b

02 6月, 2015 1 次提交

IB/ipoib: Fix RCU annotations in ipoib_neigh_hash_init() · 52374967

由 Bart Van Assche 提交于 5月 26, 2015

Avoid that sparse complains about ipoib_neigh_hash_init(). This
patch does not change any functionality. See also patch "IPoIB:
Fix memory leak in the neigh table deletion flow" (commit ID
66172c09).
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

52374967

19 5月, 2015 1 次提交

IB/Verbs: Reform IB-ulp ipoib · 8e37ab68

由 Michael Wang 提交于 5月 05, 2015

Use raw management helpers to reform IB-ulp ipoib.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8e37ab68

18 4月, 2015 1 次提交

IB/ipoib: Fix ndo_get_iflink · 2c153959

由 Erez Shitrit 提交于 4月 16, 2015

Currently, iflink of the parent interface was always accessed, even
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8 ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: NHonggang Li <honli@redhat.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NHonggang Li <honli@redhat.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c153959

16 4月, 2015 4 次提交

IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's · 1e85b806

由 Erez Shitrit 提交于 4月 02, 2015

Whenever there is no path->ah to the destination, keep only defined
number of skb's. Otherwise there are cases that the driver can keep
infinite list of skb's.

For example, when one device want to send unicast arp to the destination,
and from some reason the SM doesn't respond, the driver currently keeps
all the skb's. If that unicast arp traffic stopped, all  these skb's
are kept by the path object till the interface is down.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1e85b806

IB/ipoib: No longer use flush as a parameter · efc82eee

由 Doug Ledford 提交于 2月 21, 2015

Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue. Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

efc82eee

IB/ipoib: Use dedicated workqueues per interface · 0b39578b

由 Doug Ledford 提交于 2月 21, 2015

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue means
that we would also have to prevent all races between different devices
(for instance, ipoib_mcast_restart_task on interface ib0 can race with
ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
the rtnl_lock).

There are several possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b39578b

IB/ipoib: change init sequence ordering · be7aa663

由 Doug Ledford 提交于 2月 21, 2015

In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

be7aa663

03 4月, 2015 1 次提交

infiniband/ipoib: implement ndo_get_iflink · 5aa7add8

由 Nicolas Dichtel 提交于 4月 02, 2015

Don't use dev->iflink anymore.

CC: Roland Dreier <roland@kernel.org>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5aa7add8

31 1月, 2015 3 次提交

Revert "IPoIB: change init sequence ordering" · bb759634

由 Roland Dreier 提交于 1月 30, 2015

This reverts commit 3bcce487.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

bb759634

Revert "IPoIB: Use dedicated workqueues per interface" · 0306eda2

由 Roland Dreier 提交于 1月 30, 2015

This reverts commit 5141861c.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0306eda2

Revert "IPoIB: No longer use flush as a parameter" · a84544a4

由 Roland Dreier 提交于 1月 30, 2015

This reverts commit ce347ab9.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a84544a4

16 12月, 2014 3 次提交

IPoIB: No longer use flush as a parameter · ce347ab9

由 Doug Ledford 提交于 12月 10, 2014

ce347ab9

IPoIB: Use dedicated workqueues per interface · 5141861c

由 Doug Ledford 提交于 12月 10, 2014

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up. It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away. Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue. There are several
possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail. This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail. Again, this can
drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way. But I'm not 100% convinced that we won't cause
other problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface. In addition, keep the global
workqueue, but now limit it to only flush tasks. In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5141861c

IPoIB: change init sequence ordering · 3bcce487

由 Doug Ledford 提交于 12月 10, 2014

In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3bcce487

08 10月, 2014 1 次提交

net: better IFF_XMIT_DST_RELEASE support · 02875878

由 Eric Dumazet 提交于 10月 05, 2014

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02875878

23 9月, 2014 1 次提交

ipoib: validate struct ipoib_cb size · b49fe362

由 Eric Dumazet 提交于 9月 18, 2014

To catch future errors sooner.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b49fe362

11 8月, 2014 1 次提交

IB/ipoib: Avoid multicast join attempts with invalid P_key · dd57c930

由 Alex Estrin 提交于 8月 06, 2014

Currently, the parent interface keeps sending broadcast group join
requests even if p_key index 0 is invalid, which is possible/common in
virtualized environments where a VF has been probed to VM but the
actual P_key configuration has not yet been assigned by the management
software. This creates unnecessary noise on the fabric and in the
kernel logs:

    ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22

The original code run the multicast task regardless of the actual
P_key value, which can be avoided. The fix is to re-init resources and
bring interface up only if P_key index 0 is valid either when starting
up or on PKEY_CHANGE event.

Fixes: c2904141 ("IPoIB: Fix pkey change flow for virtualization environments")
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dd57c930

05 8月, 2014 2 次提交

IB/ipoib: Avoid flushing the workqueue from worker context · 4eae3748

由 Erez Shitrit 提交于 7月 08, 2014

The error flow of ipoib_ib_dev_open() invokes ipoib_ib_dev_stop() with
workqueue flushing enabled, which deadlocks if the open procedure
itself was called by a worker thread.

Fix this by adding a flush enabled flag to ipoib_ib_dev_open() and set
it accordingly from the locations where such a call is made.

The call trace was the following:

 [<ffffffff81095bc4>] ? flush_workqueue+0x54/0x80
 [<ffffffffa056c657>] ? ipoib_ib_dev_stop+0x447/0x650 [ib_ipoib]
 [<ffffffffa056cc34>] ? ipoib_ib_dev_open+0x284/0x430 [ib_ipoib]
 [<ffffffffa05674a8>] ? ipoib_open+0x78/0x1d0 [ib_ipoib]
 [<ffffffffa05697b8>] ? ipoib_pkey_open+0x38/0x40 [ib_ipoib]
 [<ffffffffa056cf3c>] ? __ipoib_ib_dev_flush+0x15c/0x2c0 [ib_ipoib]
 [<ffffffffa056ce56>] ? __ipoib_ib_dev_flush+0x76/0x2c0 [ib_ipoib]
 [<ffffffffa056d0a0>] ? ipoib_ib_dev_flush_heavy+0x0/0x20 [ib_ipoib]
 [<ffffffffa056d0ba>] ? ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
 [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Acked-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

4eae3748

IB/ipoib: Use P_Key change event instead of P_Key polling mechanism · db84f880

由 Erez Shitrit 提交于 7月 08, 2014

The current code use a dedicated polling logic to determine when the P_Key
assigned to the ipoib device is present in HCA port table and act accordingly.

Move to use the code which acts upon getting PKEY_CHANGE event to handle this
task and remove the P_Key polling logic/thread as they add extra complexity
which isn't needed.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Acked-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

db84f880

16 7月, 2014 1 次提交

net: set name_assign_type in alloc_netdev() · c835a677

由 Tom Gundersen 提交于 7月 14, 2014

Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit
Signed-off-by: NTom Gundersen <teg@jklm.no>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c835a677

23 1月, 2014 1 次提交

IPoIB: Report operstate consistently when brought up without a link · 437708c4

由 Michal Schmidt 提交于 1月 17, 2014

After booting without a working link, "ip link" shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbb ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Acked-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

437708c4

09 11月, 2013 2 次提交

IPoIB: lower NAPI weight · 7f1a3867

由 Michal Schmidt 提交于 8月 21, 2013

Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7f1a3867

IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush() · f47944cc

由 Erez Shitrit 提交于 10月 16, 2013

When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0
[11028.178465]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [<ffffffff814ea743>] wait_for_common+0x123/0x180
[11028.192491]  [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[11028.199504]  [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20
[11028.206224]  [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20
[11028.219375]  [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80
[11028.225712]  [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.254447]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.261062]  [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.273922]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.280452]  [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [<ffffffff814903b8>] inet_ioctl+0x88/0xa0
[11028.292633]  [<ffffffff8141591a>] sock_ioctl+0x7a/0x280
[11028.298576]  [<ffffffff81189012>] vfs_ioctl+0x22/0xa0
[11028.304326]  [<ffffffff81140540>] ? unmap_region+0x110/0x130
[11028.310756]  [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580
[11028.316897]  [<ffffffff81189731>] sys_ioctl+0x81/0xa0

and

11028.017533]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [<ffffffff814eb14b>] mutex_lock+0x2b/0x50
[11028.044910]  [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [<ffffffff8108b120>] worker_thread+0x170/0x2a0
[11028.076068]  [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0
[11028.089709]  [<ffffffff81090626>] kthread+0x96/0xa0
[11028.095266]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
[11028.100921]  [<ffffffff81090590>] ? kthread+0x0/0xa0
[11028.106573]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f47944cc

14 8月, 2013 1 次提交

IPoIB: Fix race in deleting ipoib_neigh entries · 49b8e744

由 Jim Foraker 提交于 8月 08, 2013

In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().
Signed-off-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NJack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

49b8e744

01 8月, 2013 1 次提交

IPoIB: Make sure child devices use valid/proper pkeys · 3d790a4c

由 Or Gerlitz 提交于 7月 18, 2013

Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
child devices.

Also, make sure to always set the full membership bit for the pkey of
devices created by rtnl link ops.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3d790a4c

18 4月, 2013 1 次提交

IPoIB: add support for TIPC protocol · dc850b0e

由 Patrick McHardy 提交于 4月 17, 2013

Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own
neighbour entries and doesn't require the packet to have a dst_entry
anymore, the only necessary changes are to:

- not drop multicast TIPC packets because of the unknown ethernet type
- handle unicast TIPC packets similar to IPv4/IPv6 unicast packets

in ipoib_start_xmit().

An alternative would be to remove all ethertype limitations since they're
not necessary anymore, all TIPC needs to know about is ARP and RARP since
it wants to always perform "path find", even if a path is already known.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc850b0e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功