提交 · 8966e28d2e40cfc9f694bd02dabc49afb78d7160 · openeuler / raspberrypi-kernel

26 10月, 2017 2 次提交

IB/ipoib: Use NAPI in UD/TX flows · 8966e28d

由 Erez Shitrit 提交于 10月 19, 2017

Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism
to handle the completions of each packet that has been sent to the HW.

The next major changes were taken:
 * The driver init completion function in the creation of the send CQ,
   that function triggers the napi scheduling.
 * The driver uses CQ for RX for both modes UD and CM, and CQ for TX
   for CM and UD.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8966e28d

IB/ipoib: Get rid of the tx_outstanding variable in all modes · 2c104ea6

由 Erez Shitrit 提交于 10月 19, 2017

The first step toward using NAPI in the UD/TX flow is to separate
between two flows, the NAPI and the xmit, meaning no use of shared
variables between both flows.

This patch takes out the tx_outstanding variable that was used in both
flows and instead the driver uses the 2 cyclic ring variables: tx_head
and tx_tail, tx_head used in the xmit flow and tx_tail in the NAPI flow.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2c104ea6

29 9月, 2017 1 次提交

IB/{ipoib, iser}: Consistent print format of vendor error · b04dc199

由 Ajaykumar Hotchandani 提交于 9月 27, 2017

Vendor error print should be consistent across protocols to avoid any
confusion.
This patch corrects that.
Suggested-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b04dc199

23 9月, 2017 1 次提交

IB/ipoib: Suppress the retry related completion errors · af3c79be

由 Santosh Shilimkar 提交于 9月 07, 2017

IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.
Tested-by: NMichael Nowak <michael.nowak@oracle.com>
Tested-by: NRafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: NLiwen Huang <liwen.huang@oracle.com>
Tested-by: NHong Liu <hong.x.liu@oracle.com>
Reviewed-by: NMukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: NRajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

af3c79be

25 8月, 2017 1 次提交

IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock · 69956d83

由 Erez Shitrit 提交于 8月 17, 2017

In order to avoid deadlock between sysfs functions (like create/delete
child) and remove_one (both of them are using the sysfs lock and
rtnl_lock) the driver will use a state mutex for sync.

That will fix traces as the following:
schedule+0x3e/0x90
kernfs_drain+0x75/0xf0
? wait_woken+0x90/0x90
__kernfs_remove+0x12e/0x1c0
kernfs_remove+0x25/0x40
sysfs_remove_dir+0x57/0x90
kobject_del+0x22/0x60
device_del+0x195/0x230
 pm_runtime_set_memalloc_noio+0xac/0xf0
netdev_unregister_kobject+0x71/0x80
rollback_registered_many+0x205/0x2f0
rollback_registered+0x31/0x40
unregister_netdevice_queue+0x58/0xb0
unregister_netdev+0x20/0x30
ipoib_remove_one+0xb7/0x240 [ib_ipoib]
ib_unregister_device+0xbc/0x1b0 [ib_core]
ib_unregister_mad_agent+0x29/0x30 [ib_core]
mlx4_ib_remove+0x67/0x280 [mlx4_ib]
INFO: task echo:24082 blocked for more than 120 seconds.
Tainted: G           OE   4.1.12-37.5.1.el6uek.x86_64 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
Call Trace:
schedule+0x3e/0x90
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x95/0x110
? _rcu_barrier+0x177/0x220
mutex_lock+0x23/0x40
rtnl_lock+0x15/0x20
netdev_run_todo+0x81/0x1f0
rtnl_unlock+0xe/0x10
ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
delete_child+0x69/0x80 [ib_ipoib]
dev_attr_store+0x20/0x30
sysfs_kf_write+0x41/0x50
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

69956d83

23 7月, 2017 1 次提交

IB/ipoib: Remove double pointer assigning · 1b355094

由 Leon Romanovsky 提交于 7月 15, 2017

There is no need to assign "p" pointer twice.

This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
	missing break? reassigning 'p->id'

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

1b355094

18 7月, 2017 2 次提交

IB/IPoIB: Convert IPoIB to memalloc_noio_* calls · d83187dd

由 Leon Romanovsky 提交于 5月 23, 2017

Commit 21caf2fc ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes: 934f3072 ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.

However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit 09b93088
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.

Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d83187dd

IB: Convert msleep below 20ms to usleep_range · 98e77d9f

由 Leon Romanovsky 提交于 5月 23, 2017

The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.

The full and comprehensive explanation can be found at [1] and [2].

[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

98e77d9f

02 5月, 2017 1 次提交

IB/SA: Rename ib_sa_path_rec to sa_path_rec · c2f8fc4e

由 Dasaratharaman Chandramouli 提交于 4月 27, 2017

Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2f8fc4e

21 4月, 2017 1 次提交

IB/IPoIB: Use defined function for netdev_priv function · c1048aff

由 Erez Shitrit 提交于 4月 10, 2017

Make ipoib_priv point to netdev_priv where the code calls netdev_priv.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c1048aff

02 3月, 2017 1 次提交

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

19 2月, 2017 2 次提交

IB/ipoib: Remove redudant label · 23536dfa

由 Zhu Yanjun 提交于 2月 14, 2017

There are 2 labels to mark the same statements. Replace the 2 labels
with one versatile labe.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

23536dfa

IB/ipoib: remove the unnecessary memory free · c5e8f57b

由 Zhu Yanjun 提交于 2月 10, 2017

In the function ipoib_cm_nonsrq_init_rx, the memory is not
allocated successfully. It is not necessary to free it.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c5e8f57b

13 1月, 2017 4 次提交

IB/ipoib: Change list_del to list_del_init in the tx object · 27d41d29

由 Feras Daoud 提交于 12月 28, 2016

Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

27d41d29

IB/ipoib: Use debug prints instead of warnings in RNR WC status · 13ee429a

由 Feras Daoud 提交于 12月 28, 2016

If a receive request has not been posted to the work queue, the incoming
message is rejected and the peer will receive a receiver-not-ready (RNR)
error. In IPoIB, IB_WC_RNR_RETRY_EXC_ERR error is part of the life cycle
therefore ipoib_cm_handle_tx_wc function will print to debug instead
of warnings.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

13ee429a

IB/ipoib: Add detailed error message to dev_queue_xmit call · d32b9a81

由 Feras Daoud 提交于 12月 28, 2016

Add a detailed return code to dev_queue_xmit function when
calling to requeue packet via __skb_dequeue.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d32b9a81

IB/ipoib: Fix deadlock between rmmod and set_mode · 0a0007f2

由 Feras Daoud 提交于 12月 28, 2016

When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: <stable@vger.kernel.org> # v3.6+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0a0007f2

04 12月, 2016 1 次提交

IB/ipoib: Remove and fix debug prints after allocation failure · 74226649

由 Leon Romanovsky 提交于 11月 03, 2016

The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

74226649

17 11月, 2016 1 次提交

IB/IPoIB: Remove can't use GFP_NOIO warning · 0b59970e

由 Kamal Heib 提交于 11月 10, 2016

Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b59970e

14 10月, 2016 1 次提交

IB/ipoib: move back IB LL address into the hard header · fc791b63

由 Paolo Abeni 提交于 10月 13, 2016

After the commit 9207f9d4 ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc791b63

03 9月, 2016 1 次提交

IB/ipoib: Fix memory corruption in ipoib cm mode connect flow · 546481c2

由 Erez Shitrit 提交于 8月 28, 2016

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba ('IPoIB: Connected mode experimental support')
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

546481c2

07 6月, 2016 1 次提交

IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions · 198b12f7

由 Erez Shitrit 提交于 6月 04, 2016

In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
operation like dev_change_flags or unregister_netdev, while sysfs
callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
free its sysfs directory, and we will get  a->b, b->a deadlock.

    Trace like the following:

        schedule+0x37/0x80
        schedule_preempt_disabled+0xe/0x10
        __mutex_lock_slowpath+0xb5/0x120
        mutex_lock+0x23/0x40
        rtnl_lock+0x15/0x20
        netdev_run_todo+0x17c/0x320
        rtnl_unlock+0xe/0x10
        ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
        delete_child+0x54/0x80 [ib_ipoib]
        dev_attr_store+0x18/0x30
        sysfs_kf_write+0x37/0x40
        mutex_lock+0x16/0x40
        SyS_write+0x55/0xc0
        entry_SYSCALL_64_fastpath+0x16/0x75
    And
        schedule+0x37/0x80
        __kernfs_remove+0x1a8/0x260
        ? wake_atomic_t_function+0x60/0x60
        kernfs_remove+0x25/0x40
        sysfs_remove_dir+0x50/0x80
        kobject_del+0x18/0x50
        device_del+0x19f/0x260
        netdev_unregister_kobject+0x6a/0x80
        rollback_registered_many+0x1fd/0x340
        rollback_registered+0x3c/0x70
        unregister_netdevice_queue+0x55/0xc0
        unregister_netdev+0x20/0x30
        ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
        ib_unregister_client+0x4a/0x170 [ib_core]
        ? find_module_all+0x71/0xa0
        ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
        SyS_delete_module+0x1b5/0x210
        entry_SYSCALL_64_fastpath+0x16/0x75

The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
get out from the sysfs function.

Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Fixes: 9baa0b03 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

198b12f7

05 5月, 2016 1 次提交

treewide: replace dev->trans_start update with helper · 860e9538

由 Florian Westphal 提交于 5月 03, 2016

Replace all trans_start updates with netif_trans_update helper.
change was done via spatch:

struct net_device *d;
@@
- d->trans_start = jiffies
+ netif_trans_update(d)

Compile tested only.

Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: linux-xtensa@linux-xtensa.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: MPT-FusionLinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-can@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
Acked-by: NAntonio Quartulli <a@unstable.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

860e9538

03 3月, 2016 1 次提交

IB/ipoib: Add handling for sending of skb with many frags · 78a50a5e

由 Hans Westgaard Ry 提交于 3月 02, 2016

IPoIB converts skb-fragments to sge adding 1 extra sge when SG is enabled.
Current codepath assumes that the max number of sge a device support
is at least MAX_SKB_FRAGS+1, there is no interaction with upper layers
to limit number of fragments in an skb if a device suports fewer
sges. The assumptions also lead to requesting a fixed number of sge
when IPoIB creates queue-pairs with SG enabled.

A fallback/slowpath is implemented using skb_linearize to
handle cases where the conversion would result in more sges than supported.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78a50a5e

23 12月, 2015 1 次提交

IB/ulps: Avoid calling ib_query_device · 4a061b28

由 Or Gerlitz 提交于 12月 18, 2015

Instead, use the cached copy of the attributes present on the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4a061b28

12 12月, 2015 1 次提交

IB: add a proper completion queue abstraction · 14d3a3b2

由 Christoph Hellwig 提交于 12月 11, 2015

This adds an abstraction that allows ULPs to simply pass a completion
object and completion callback with each submitted WR and let the RDMA
core handle the nitty gritty details of how to handle completion
interrupts and poll the CQ.

In detail there is a new ib_cqe structure which just contains the
completion callback, and which can be used to get at the containing
object using container_of.  It is pointed to by the WR and WC as an
alternative to the wr_id field, similar to how many ULPs already use
the field to store a pointer using casts.

A driver using the new completion callbacks allocates it's CQs using
the new ib_create_cq API, which in addition to the number of CQEs and
the completion vectors also takes a mode on how we poll for CQEs.
Three modes are available: direct for drivers that never take CQ
interrupts and just poll for them, softirq to poll from softirq context
using the to be renamed blk-iopoll infrastructure which takes care of
rearming and budgeting, or a workqueue for consumer who want to be
called from user context.

Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
the current version of the workqueue code because my two previous
attempts sucked too much and converted the iSER initiator to the new
API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

14d3a3b2

08 10月, 2015 1 次提交

IB: split struct ib_send_wr · e622f2f4

由 Christoph Hellwig 提交于 10月 08, 2015

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: NHaggai Eran <haggaie@mellanox.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>

e622f2f4

31 8月, 2015 2 次提交

IB/ipoib: Remove ib_get_dma_mr calls · 77b1f996

由 Jason Gunthorpe 提交于 7月 30, 2015

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

77b1f996

IB/cm: Remove compare_data checks · 73fec7fd

由 Haggai Eran 提交于 7月 30, 2015

Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

73fec7fd

15 7月, 2015 1 次提交

IB/ipoib: Scatter-Gather support in connected mode · c4268778

由 Yuval Shaia 提交于 7月 12, 2015

By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
performance.
This MTU plus overhead puts the memory allocation for IP based packets at
32 4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).

This enhancement resolve the issue by removing the physically contiguous
memory requirement using Scatter/Gather feature that exists in Linux stack.

With this fix Scatter-Gather will be supported also in connected mode.

This change reverts some of the change made in commit e112373f
("IPoIB/cm: Reduce connected mode TX object size").

The ability to use SG in IPoIB CM is possible because the coupling
between NETIF_F_SG and NETIF_F_CSUM was removed in commit
ec5f0615 ("net: Kill link between CSUM and SG features.")
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NChristian Marie <christian@ponies.io>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4268778

06 5月, 2015 1 次提交

IPoIB/CM: Fix indentation level · 8f71c1a2

由 Bart Van Assche 提交于 5月 05, 2015

See also patch "IPoIB/cm: Add connected mode support for devices
without SRQs" (commit ID 68e995a2). Detected by smatch.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8f71c1a2

16 4月, 2015 1 次提交

IB/ipoib: Use dedicated workqueues per interface · 0b39578b

由 Doug Ledford 提交于 2月 21, 2015

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue means
that we would also have to prevent all races between different devices
(for instance, ipoib_mcast_restart_task on interface ib0 can race with
ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
the rtnl_lock).

There are several possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b39578b

31 1月, 2015 1 次提交

Revert "IPoIB: Use dedicated workqueues per interface" · 0306eda2

由 Roland Dreier 提交于 1月 30, 2015

This reverts commit 5141861c.

The series of IPoIB bug fixes that went into 3.19-rc1 introduce
regressions, and after trying to sort things out, we decided to revert
to 3.18's IPoIB driver and get things right for 3.20.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0306eda2

16 12月, 2014 1 次提交

IPoIB: Use dedicated workqueues per interface · 5141861c

由 Doug Ledford 提交于 12月 10, 2014

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up. It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away. Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue. There are several
possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail. This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail. Again, this can
drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way. But I'm not 100% convinced that we won't cause
other problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface. In addition, keep the global
workqueue, but now limit it to only flush tasks. In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5141861c

03 6月, 2014 1 次提交

IB: Add a QP creation flag to use GFP_NOIO allocations · 09b93088

由 Or Gerlitz 提交于 5月 11, 2014

This addresses a problem where NFS client writes over IPoIB connected
mode may deadlock on memory allocation/writeback.

The problem is not directly memory reclamation.  There is an indirect
dependency between network filesystems writing back pages and
ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
make forward progress until ipoib_cm_tx_init() succeeds and it is
stuck in page reclaim itself waiting for network transmission.
Ordinarily this situation may be avoided by having the caller use
GFP_NOFS but ipoib_cm_tx_init() does not have that information.

To address this, take a general approach and add a new QP creation
flag that tells the low-level hardware driver to use GFP_NOIO for the
memory allocations related to the new QP.

Use the new flag in the ipoib connected mode path, and if the driver
doesn't support it, re-issue the QP creation without the flag.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

09b93088

09 11月, 2013 1 次提交

IPoIB: Change CM skb memory allocation to be non-atomic during init · 22252b4e

由 Tal Alon 提交于 10月 16, 2013

Change CM skb memory allocation to use GFP_KERNEL when possible.

During device init there's no need to use GFP_ATOMIC when allocating
memory for the CM skbs -- use GFP_KERNEL instead.
Signed-off-by: NTal Alon <talal@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

22252b4e

14 8月, 2013 1 次提交

IPoIB: Fix race in deleting ipoib_neigh entries · 49b8e744

由 Jim Foraker 提交于 8月 08, 2013

In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().
Signed-off-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NJack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

49b8e744

08 5月, 2013 1 次提交

drivers/infiniband/hw: rename random32() to prandom_u32() · 50bea5c0

由 Andrew Morton 提交于 5月 07, 2013

Use preferable function name which implies using a pseudo-random number
generator.
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50bea5c0

17 4月, 2013 1 次提交

RDMA: Rename random32() to prandom_u32() · cc529c0d

由 Akinobu Mita 提交于 3月 04, 2013

Use more preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cc529c0d

23 3月, 2013 1 次提交

IPoIB: Fix send lockup due to missed TX completion · 1ee9e2aa

由 Mike Marciniszyn 提交于 2月 26, 2013

Commit f0dc117a ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1ee9e2aa