提交 · fc33a8943efb25bc025750c7f4ea643fae526860 · openanolis / cloud-kernel

23 7月, 2017 3 次提交

IB/ipoib: Make sure no in-flight joins while leaving that mcast · a08e1120

由 Erez Shitrit 提交于 7月 12, 2017

While cleaning neighs and there is a send-only mcast neigh, the driver
should wait to finish its join process before trying to remove it.

Without this patch, we will see messages like: "ipoib_mcast_leave on an
in-flight join" and unexpected results in the join_complete.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>

a08e1120

IB/ipoib: Use cancel_delayed_work_sync when needed · 6bdc8de2

由 Erez Shitrit 提交于 7月 12, 2017

The work mcast_task can re-queue itself, so instead of doing
cancel && flush_workqueue, that still can leave a queued task
on the air, use cancel_delayed_work_sync.

Also, no need to use lock over the cancel, the original lock was
due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use
anymore.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>

6bdc8de2

IB/ipoib: Fix race between light events and interface restart · edf3f301

由 Feras Daoud 提交于 7月 10, 2017

A potential race between light_event and interface restart
may attach multicast group to an already attached QP.

Scenario:
light_event flow goes through ipoib_mcast_dev_flush function,
if a context switch occurs before calling ipoib_mcast_remove_list,
then we may face a situation where the broadcast of the priv is null
and the corresponding QP is not detached yet.
If an "interface restart" runs during the previous context switch,
the following scenario occurs:
When the device goes up, ipoib_ib_dev_up function will be called,
it will send a new registration request to the broadcast group and then
attach the group to the QP that was not detached before.

     IPOIB_FLUSH_LIGHT                                          INTERFACE RESTART

    __ipoib_ib_dev_flush                                                |
        |                                                               |
        |                                                               |
        |                                                               |
    ipoib_mcast_dev_flush                                               |
    Move mcast list and broadcast to remove_list                        |
        |                                                               |
        |                                                               |
    Context Switch-->                                                   |
        |                                                       ipoib_ib_dev_down
        |                                                               |
        |                                                               |
        |                                                       ipoib_ib_dev_up
        |                                                               |
        |                                                               |
        |                                                       ipoib_mcast_join_task
        |                                                       allocate new broadcast
        |                                                               |
        |                                                               |
        |                                                       Attach QP to multicast group
        |                                                               |
        |                                                               |
        |                                                       <--Context Switch
    ipoib_mcast_leave
    Detach QP from multicast group
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>

edf3f301

02 5月, 2017 4 次提交

IB/core: Define 'ib' and 'roce' rdma_ah_attr types · 44c58487

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

44c58487

IB/core: Use rdma_ah_attr accessor functions · d8966fcd

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d8966fcd

IB/core: Rename struct ib_ah_attr to rdma_ah_attr · 90898850

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

90898850

IB/IPoIB: Remove 'else' when the 'if' has a return. · cfd51935

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

This patch fixes a checkpatch issue related to not having
to use an 'else' if the 'if' path returns from the function.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cfd51935

29 4月, 2017 1 次提交

IB/SA: Modify SA to implicitly cache Class Port info · ee1c60b1

由 Dasaratharaman Chandramouli 提交于 3月 20, 2017

SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ee1c60b1

21 4月, 2017 3 次提交

IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow · 3e31a490

由 Feras Daoud 提交于 3月 19, 2017

Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.

On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.

    ipoib_stop                          |
        |                               |
    clear_bit(IPOIB_FLAG_ADMIN_UP)      |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join_task
        |                               |
        |                       spin_lock_irq(lock)
        |                               |
        |                       init_completion(mcast)
        |                               |
        |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
        |                               |
        |                       Context Switch
        |                               |
    clear_bit(IPOIB_FLAG_OPER_UP)       |
        |                               |
    spin_lock_irqsave(lock)             |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join
        |                       return (-EINVAL)
        |                               |
        |                       spin_unlock_irq(lock)
        |                               |
        |                       Context Switch
        |                               |
    ipoib_mcast_dev_flush               |
    wait_for_completion(mcast)          |

ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:

    [13441.639268] Call Trace:
    [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
    [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
    [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
    [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
    [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
    [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
    [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
    [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
    [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]

Fixes: 08bc3276 ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3e31a490

IB/IPoIB: Support acceleration options callbacks · cd565b4b

由 Erez Shitrit 提交于 4月 10, 2017

IPoIB driver now uses the new set of callback functions.

If the hardware provider supports the new ipoib_options implementation,
the driver uses the callbacks in its data path flows, otherwise it uses the
driver default implementation for all data flows in its code.

The default implementation wasn't change and it is exactly as it was before
introduction of acceleration support.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cd565b4b

IB/IPoIB: Use defined function for netdev_priv function · c1048aff

由 Erez Shitrit 提交于 4月 10, 2017

Make ipoib_priv point to netdev_priv where the code calls netdev_priv.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c1048aff

25 1月, 2017 1 次提交

IB/ipoib: Remove the unnecessary error check · 5c37077f

由 Zhu Yanjun 提交于 1月 18, 2017

The function ipoib_mcast_start_thread/ipoib_ib_dev_up always return zero.
As such, in the function ipoib_open, err_stop will never be reached.
So remove this err_stop and change the return type of the function
ipoib_mcast_start_thread/ipoib_ib_dev_up to void.
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5c37077f

13 1月, 2017 1 次提交

IB/ipoib: Add detailed error message to dev_queue_xmit call · d32b9a81

由 Feras Daoud 提交于 12月 28, 2016

Add a detailed return code to dev_queue_xmit function when
calling to requeue packet via __skb_dequeue.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d32b9a81

15 12月, 2016 1 次提交

IPoIB: Avoid reading an uninitialized member variable · 11b642b8

由 Bart Van Assche 提交于 11月 21, 2016

This patch avoids that Coverity reports the following:

    Using uninitialized value port_attr.state when calling printk

Fixes: commit 94232d9c ("IPoIB: Start multicast join process only on active ports")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

11b642b8

14 10月, 2016 1 次提交

IB/ipoib: move back IB LL address into the hard header · fc791b63

由 Paolo Abeni 提交于 10月 13, 2016

After the commit 9207f9d4 ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc791b63

07 6月, 2016 1 次提交

IB/IPoIB: Disable bottom half when dealing with device address · 9b29953b

由 Mark Bloch 提交于 6月 04, 2016

Align locking usage when touching device address with rest
of the kernel. Lock the bottom half when doing so using
netif_addr_lock_bh.

This also solves the following case as reported by lockdep:
	CPU0                    CPU1
	----                    ----
lock(_xmit_INFINIBAND);
				local_irq_disable();
				lock(&(&mc->mca_lock)->rlock);
				lock(_xmit_INFINIBAND);
<Interrupt>
lock(&(&mc->mca_lock)->rlock);

*** DEADLOCK ***

Fixes: 492a7e67 ("IB/IPoIB: Allow setting the device address")
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9b29953b

26 5月, 2016 2 次提交

IB/IPoIB: Allow setting the device address · 492a7e67

由 Mark Bloch 提交于 5月 18, 2016

In IB networks, and specifically in IPoIB/rdmacm traffic, the device
address of an IPoIB interface is used as a means to exchange information
between nodes needed for communication.

Currently an IPoIB interface will always be created with a device
address based on its node GUID without a way to change that.

This change adds the ability to set the device address of an IPoIB
interface by value. We use the set mac address ndo to do that.

The flow should be broken down to two:
1) The GID value is already in the GID table,
   in this case the interface will be able to set carrier up.

2) The GID value is not yet in the GID table,
   in this case the interface won't try to join the multicast group
   and will wait (listen on GID_CHANGE event) until the GID is inserted.

In order to track those changes, we add a new flag:
* IPOIB_FLAG_DEV_ADDR_SET.

When set, it means the dev_addr is a based on a value in the gid
table. this bit will be cleared upon a dev_addr change triggered
by the user and set after validation.

Per IB spec the port GUID can't change if the module is loaded.
port GUID is the basis for GID at index 0 which is the basis for
the default device address of a ipoib interface.

The issue is that there are devices that don't follow the spec,
they change the port GUID while HCA is powered on, so in order
not to break userspace applications. We need to check if the
user wanted to control the device address and we assume that
if he sets the device address back to be based on GID index 0,
he no longer wishs to control it.

In order to track this, we add an additional flag:
* IPOIB_FLAG_DEV_ADDR_CTRL

When setting the device address, there is no validation of the upper
twelve bytes of the device address (flags, qpn, subnet prefix) as those
bytes are not under the control of the user.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

492a7e67

IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join · 3b561130

由 Erez Shitrit 提交于 5月 25, 2016

Check (via an SA query) if the SM supports the new option for SendOnly
multicast joins.
If the SM supports that option it will use the new join state to create
such multicast group.
If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
join for SendOnly MCG, use the correct state if supported.

This check is performed at every invocation of mcast_restart task, to be
sure that the driver stays in sync with the current state of the SM.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3b561130

13 2月, 2016 1 次提交

IB/ipoib: fix for rare multicast join race condition · 08bc3276

由 Alex Estrin 提交于 2月 11, 2016

A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):

[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
 }
Proposed patch should prevent multicast join task to continue
if link state change is detected.
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>

Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

08bc3276

20 1月, 2016 1 次提交

IB/IPoIB: Fix kernel panic on multicast flow · 50be28de

由 Erez Shitrit 提交于 1月 07, 2016

ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.

This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS: 00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
	process_one_work+0x164/0x470
	worker_thread+0x11d/0x420
	...

Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reported-by: NDoron Tsur <doront@mellanox.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50be28de

24 12月, 2015 2 次提交

IB/IPoIB: Move multicast specific code out of ipoib_main.c · 432c55ff

由 Christoph Lameter 提交于 12月 21, 2015

Code cleanup to move multicast specific code that checks for
a sendonly join to ipoib_multicast.c. This allows the removal
of the export of __ipoib_mcast_find().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

432c55ff

IB/IPoIB: factor out common multicast list removal code · 5a0e81f6

由 Christoph Lameter 提交于 12月 21, 2015

Code cleanup to remove multicast specific code from ipoib_main.c

The removal of a list of multicast groups occurs in three places.
Create a new function ipoib_mcast_remove_list(). Use this new
function in ipoib_main.c too.
That in turn allows the dropping of two functions that were
exported from ipoib_multicast.c for expiration of mc groups.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5a0e81f6

22 10月, 2015 1 次提交

IB/core: Add netdev and gid attributes paramteres to cache · 55ee3ab2

由 Matan Barak 提交于 10月 15, 2015

Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55ee3ab2

14 10月, 2015 1 次提交

IB/ipoib: For sendonly join free the multicast group on leave · 0b5c9279

由 Christoph Lameter 提交于 10月 11, 2015

When we leave the multicast group on expiration of a neighbor we
do not free the mcast structure. This results in a memory leak
that causes ib_dealloc_pd to fail and print a WARN_ON message
and backtrace.

Fixes: bd99b2e0 (IB/ipoib: Expire sendonly multicast joins)
Signed-off-by: NChristoph Lameter <cl@linux.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b5c9279

08 10月, 2015 1 次提交

IB: split struct ib_send_wr · e622f2f4

由 Christoph Hellwig 提交于 10月 08, 2015

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: NHaggai Eran <haggaie@mellanox.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>

e622f2f4

26 9月, 2015 2 次提交

IB/ipoib: Make sendonly multicast joins create the mcast group · c3852ab0

由 Doug Ledford 提交于 9月 25, 2015

Since IPoIB should, as much as possible, emulate how multicast
sends work on Ethernet for regular TCP/IP apps, there should be
no requirement to subscribe to a multicast group before your
sends are properly sent.  However, due to the difference in how
multicast is handled on InfiniBand, we must join the appropriate
multicast group before we can send to it.  Previously we tried
not to trigger the auto-create feature of the subnet manager when
doing this because we didn't have tracking of these sendonly
groups and the auto-creation might never get undone.  The previous
patch added timing to these sendonly joins and allows us to
leave them after a reasonable idle expiration time.  So supply
all of the information needed to auto-create group.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3852ab0

IB/ipoib: Expire sendonly multicast joins · bd99b2e0

由 Christoph Lameter 提交于 9月 24, 2015

On neighbor expiration, check to see if the neighbor was actually a
sendonly multicast join, and if so, leave the multicast group as we
expire the neighbor.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd99b2e0

04 9月, 2015 2 次提交

IB/ipoib: Suppress warning for send only join failures · d1178cbc

由 Jason Gunthorpe 提交于 8月 21, 2015

We expect send only joins to fail, it just means there are no listeners
for the group. The correct thing to do is silently drop the packet
at source.

Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
to 224.0.0.22, and then a warning level kmessage like this:

 ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22

If there is no IP router listening to IGMP.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d1178cbc

IB/ipoib: Clean up send-only multicast joins · c3acdc06

由 Doug Ledford 提交于 9月 03, 2015

Even though we don't expect the group to be created by the SM we
sill need to provide all the parameters to force the SM to validate
they are correct.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3acdc06

16 4月, 2015 9 次提交

IB/ipoib: Remove IPOIB_MCAST_RUN bit · 0e5544d9

由 Erez Shitrit 提交于 4月 02, 2015

After Doug Ledford's changes there is no need in that bit, it's
semantic becomes subset of the IPOIB_FLAG_OPER_UP bit.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0e5544d9

IB/ipoib: Update broadcast record values after each successful join request · 3fd0605c

由 Erez Shitrit 提交于 4月 02, 2015

Update the cached broadcast record in the priv object after every new
join of this broadcast domain group.

These values are needed for the port configuration (MTU size) and to
all the new multicast (non-broadcast) join requests initial parameters.

For example, SM starts with 2K MTU for all the fabric, and after that it
restarts (or handover to new SM) with new port configuration of 4K MTU.
Without using the new values, the driver will keep its old configuration
of 2K and will not apply the new configuration of 4K.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3fd0605c

IB/ipoib: drop mcast_mutex usage · 1c0453d6