提交 · 28a0ea77ba50fc8a9684b7af96ece05dfeb6f64b · openeuler / Kernel

06 9月, 2018 1 次提交

IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler · 816e846c

由 Aaron Knister 提交于 8月 24, 2018

Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which leaves
a window where cm_rep_handler can run, set the connection up, dequeue
pending packets and leave the subsequently queued packets by start_xmit()
sitting on neigh->queue until they're dropped when the connection is torn
down. This only applies to connected mode. These dropped packets can
really upset TCP, for example, and cause multi-minute delays in
transmission for open connections.

Here's the code in start_xmit where we check to see if the connection is
up:

       if (ipoib_cm_get(neigh)) {
               if (ipoib_cm_up(neigh)) {
                       ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
                       goto unref;
               }
       }

The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet in
start_xmit where packets are queued.

       if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
               push_pseudo_header(skb, phdr->hwaddr);
               spin_lock_irqsave(&priv->lock, flags);
               __skb_queue_tail(&neigh->queue, skb);
               spin_unlock_irqrestore(&priv->lock, flags);
       } else {
               ++dev->stats.tx_dropped;
               dev_kfree_skb_any(skb);
       }

The patch acquires the netif tx lock in cm_rep_handler for the section
where it sets the connection up and dequeues and retransmits deferred
skb's.

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Cc: stable@vger.kernel.org
Signed-off-by: NAaron Knister <aaron.s.knister@nasa.gov>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

816e846c

03 8月, 2018 2 次提交

IB/ipoib: Get rid of the sysfs_mutex · ee190ab7

由 Jason Gunthorpe 提交于 7月 29, 2018

This mutex was introduced to deal with the deadlock formed by calling
unregister_netdev from within the sysfs callback of a netdev.

Now that we have priv_destructor and needs_free_netdev we can switch
to the more targeted solution of running the unregister from a
work queue. This avoids the deadlock and gets rid of the mutex.

The next patch in the series needs this mutex eliminated to create
atomicity of unregisteration.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

ee190ab7

IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN · 577e07ff

由 Jason Gunthorpe 提交于 7月 29, 2018

This essentially duplicates the netdev's reg_state, so just use that
directly. The reg_state is updated under the rntl_lock, and all places
using GOING_DOWN already acquire the rtnl_lock so checking is safe.

Since the only place we use GOING_DOWN is for the parent device this
does not fix any bugs, but it is a step to tidy up the unregister flow
so that after later patches the flow is uniform and sane.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

577e07ff

31 7月, 2018 2 次提交

RDMA/cma: Constify path record, ib_cm_event, listen_id pointers · e7ff98ae

由 Parav Pandit 提交于 7月 29, 2018

Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e7ff98ae

RDMA/ipoib: Fix check for return code from ib_create_srq · e586e1e1

由 Kamal Heib 提交于 7月 30, 2018

Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the
return code from ib_create_srq() in case that it not supported.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e586e1e1

25 7月, 2018 1 次提交

IB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls · 4b4671a0

由 Bart Van Assche 提交于 7月 18, 2018

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4b4671a0

10 7月, 2018 3 次提交

RDMA/ipoib: Fix use of sizeof() · b1b63970

由 Kamal Heib 提交于 7月 05, 2018

Make sure to use sizeof(...) instead of sizeof ... which is more
preferred.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b1b63970

RDMA/ipoib: Prefer unsigned int to bare use of unsigned · 0578cdad

由 Kamal Heib 提交于 7月 05, 2018

This commit replaces all the unsigned definitions in favour of 'unsigned
int' which is preferred.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0578cdad

RDMA/ipoib: Use min_t() macro instead of min() · 299c36b1

由 Kamal Heib 提交于 7月 05, 2018

Use min_t() macro to avoid the casting when using min() macro, also fix
the type of "length" and "wc->byte_len" to be "unsigned int" and
"u32" which is the right type for each one of them.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

299c36b1

19 6月, 2018 1 次提交

IB/core: add max_send_sge and max_recv_sge attributes · 33023fb8

由 Steve Wise 提交于 6月 18, 2018

This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

33023fb8

13 6月, 2018 1 次提交

treewide: Use array_size() in vzalloc() · fad953ce

由 Kees Cook 提交于 6月 12, 2018

The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vzalloc(a * b)

with:
        vzalloc(array_size(a, b))

as well as handling cases of:

        vzalloc(a * b * c)

with:

        vzalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vzalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc(C1 * C2 * C3, ...)
|
  vzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc(C1 * C2, ...)
|
  vzalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)
Signed-off-by: NKees Cook <keescook@chromium.org>

fad953ce

26 1月, 2018 1 次提交

net: don't call update_pmtu unconditionally · f15ca723

由 Nicolas Dichtel 提交于 1月 25, 2018

Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f15ca723

04 1月, 2018 1 次提交

IB/ipoib: Fix for notify send CQ failure messages · 809cb695

由 Alex Estrin 提交于 1月 03, 2018

If IB_CQ_REPORT_MISSED_EVENTS flag is passed in ib_req_notify_cq()
it may return positive value indicating non-empty CQ.
If return code not verified the log might be flooded with false
warning messages "request notify on send CQ failed".

Fixes: 8966e28d ("IB/ipoib: Use NAPI in UD/TX flows")
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

809cb695

14 12月, 2017 1 次提交

IB/ipoib: Restore MM behavior in case of tx_ring allocation failure · 9d98e19b

由 Yuval Shaia 提交于 12月 13, 2017

memalloc_noio_save modifies the behavior of MM, we must restore it after
we are done.

Fixes: d83187dd ("IB/IPoIB: Convert IPoIB to memalloc_noio_* calls")
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9d98e19b

12 12月, 2017 1 次提交

IB/ipoib: Replace printk with pr_warn · c55359a2

由 Yuval Shaia 提交于 11月 29, 2017

pr_* is the preferred way to print messages, replace all
printk(KERN_WARN, ...) with pr_warn.
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c55359a2

26 10月, 2017 2 次提交

IB/ipoib: Use NAPI in UD/TX flows · 8966e28d

由 Erez Shitrit 提交于 10月 19, 2017

Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism
to handle the completions of each packet that has been sent to the HW.

The next major changes were taken:
 * The driver init completion function in the creation of the send CQ,
   that function triggers the napi scheduling.
 * The driver uses CQ for RX for both modes UD and CM, and CQ for TX
   for CM and UD.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8966e28d

IB/ipoib: Get rid of the tx_outstanding variable in all modes · 2c104ea6

由 Erez Shitrit 提交于 10月 19, 2017

The first step toward using NAPI in the UD/TX flow is to separate
between two flows, the NAPI and the xmit, meaning no use of shared
variables between both flows.

This patch takes out the tx_outstanding variable that was used in both
flows and instead the driver uses the 2 cyclic ring variables: tx_head
and tx_tail, tx_head used in the xmit flow and tx_tail in the NAPI flow.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2c104ea6

29 9月, 2017 1 次提交

IB/{ipoib, iser}: Consistent print format of vendor error · b04dc199

由 Ajaykumar Hotchandani 提交于 9月 27, 2017

Vendor error print should be consistent across protocols to avoid any
confusion.
This patch corrects that.
Suggested-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b04dc199

23 9月, 2017 1 次提交

IB/ipoib: Suppress the retry related completion errors · af3c79be

由 Santosh Shilimkar 提交于 9月 07, 2017

IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.
Tested-by: NMichael Nowak <michael.nowak@oracle.com>
Tested-by: NRafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: NLiwen Huang <liwen.huang@oracle.com>
Tested-by: NHong Liu <hong.x.liu@oracle.com>
Reviewed-by: NMukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: NRajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

af3c79be

25 8月, 2017 1 次提交

IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock · 69956d83

由 Erez Shitrit 提交于 8月 17, 2017

In order to avoid deadlock between sysfs functions (like create/delete
child) and remove_one (both of them are using the sysfs lock and
rtnl_lock) the driver will use a state mutex for sync.

That will fix traces as the following:
schedule+0x3e/0x90
kernfs_drain+0x75/0xf0
? wait_woken+0x90/0x90
__kernfs_remove+0x12e/0x1c0
kernfs_remove+0x25/0x40
sysfs_remove_dir+0x57/0x90
kobject_del+0x22/0x60
device_del+0x195/0x230
 pm_runtime_set_memalloc_noio+0xac/0xf0
netdev_unregister_kobject+0x71/0x80
rollback_registered_many+0x205/0x2f0
rollback_registered+0x31/0x40
unregister_netdevice_queue+0x58/0xb0
unregister_netdev+0x20/0x30
ipoib_remove_one+0xb7/0x240 [ib_ipoib]
ib_unregister_device+0xbc/0x1b0 [ib_core]
ib_unregister_mad_agent+0x29/0x30 [ib_core]
mlx4_ib_remove+0x67/0x280 [mlx4_ib]
INFO: task echo:24082 blocked for more than 120 seconds.
Tainted: G           OE   4.1.12-37.5.1.el6uek.x86_64 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
Call Trace:
schedule+0x3e/0x90
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x95/0x110
? _rcu_barrier+0x177/0x220
mutex_lock+0x23/0x40
rtnl_lock+0x15/0x20
netdev_run_todo+0x81/0x1f0
rtnl_unlock+0xe/0x10
ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
delete_child+0x69/0x80 [ib_ipoib]
dev_attr_store+0x20/0x30
sysfs_kf_write+0x41/0x50
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

69956d83

23 7月, 2017 1 次提交

IB/ipoib: Remove double pointer assigning · 1b355094

由 Leon Romanovsky 提交于 7月 15, 2017

There is no need to assign "p" pointer twice.

This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
	missing break? reassigning 'p->id'

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

1b355094

18 7月, 2017 2 次提交

IB/IPoIB: Convert IPoIB to memalloc_noio_* calls · d83187dd

由 Leon Romanovsky 提交于 5月 23, 2017

Commit 21caf2fc ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes: 934f3072 ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.

However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit 09b93088
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.

Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d83187dd

IB: Convert msleep below 20ms to usleep_range · 98e77d9f

由 Leon Romanovsky 提交于 5月 23, 2017

The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.

The full and comprehensive explanation can be found at [1] and [2].

[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

98e77d9f

02 5月, 2017 1 次提交

IB/SA: Rename ib_sa_path_rec to sa_path_rec · c2f8fc4e

由 Dasaratharaman Chandramouli 提交于 4月 27, 2017

Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2f8fc4e

21 4月, 2017 1 次提交

IB/IPoIB: Use defined function for netdev_priv function · c1048aff

由 Erez Shitrit 提交于 4月 10, 2017

Make ipoib_priv point to netdev_priv where the code calls netdev_priv.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c1048aff

02 3月, 2017 1 次提交

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

19 2月, 2017 2 次提交

IB/ipoib: Remove redudant label · 23536dfa

由 Zhu Yanjun 提交于 2月 14, 2017

There are 2 labels to mark the same statements. Replace the 2 labels
with one versatile labe.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

23536dfa

IB/ipoib: remove the unnecessary memory free · c5e8f57b

由 Zhu Yanjun 提交于 2月 10, 2017

In the function ipoib_cm_nonsrq_init_rx, the memory is not
allocated successfully. It is not necessary to free it.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c5e8f57b

13 1月, 2017 4 次提交

IB/ipoib: Change list_del to list_del_init in the tx object · 27d41d29

由 Feras Daoud 提交于 12月 28, 2016

Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

27d41d29

IB/ipoib: Use debug prints instead of warnings in RNR WC status · 13ee429a

由 Feras Daoud 提交于 12月 28, 2016

If a receive request has not been posted to the work queue, the incoming
message is rejected and the peer will receive a receiver-not-ready (RNR)
error. In IPoIB, IB_WC_RNR_RETRY_EXC_ERR error is part of the life cycle
therefore ipoib_cm_handle_tx_wc function will print to debug instead
of warnings.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

13ee429a

IB/ipoib: Add detailed error message to dev_queue_xmit call · d32b9a81

由 Feras Daoud 提交于 12月 28, 2016

Add a detailed return code to dev_queue_xmit function when
calling to requeue packet via __skb_dequeue.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d32b9a81

IB/ipoib: Fix deadlock between rmmod and set_mode · 0a0007f2

由 Feras Daoud 提交于 12月 28, 2016

When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: <stable@vger.kernel.org> # v3.6+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0a0007f2

04 12月, 2016 1 次提交

IB/ipoib: Remove and fix debug prints after allocation failure · 74226649

由 Leon Romanovsky 提交于 11月 03, 2016

The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

74226649

17 11月, 2016 1 次提交

IB/IPoIB: Remove can't use GFP_NOIO warning · 0b59970e

由 Kamal Heib 提交于 11月 10, 2016

Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b59970e

14 10月, 2016 1 次提交

IB/ipoib: move back IB LL address into the hard header · fc791b63

由 Paolo Abeni 提交于 10月 13, 2016

After the commit 9207f9d4 ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc791b63

03 9月, 2016 1 次提交

IB/ipoib: Fix memory corruption in ipoib cm mode connect flow · 546481c2

由 Erez Shitrit 提交于 8月 28, 2016

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba ('IPoIB: Connected mode experimental support')
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

546481c2

07 6月, 2016 1 次提交

IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions · 198b12f7

由 Erez Shitrit 提交于 6月 04, 2016

In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
operation like dev_change_flags or unregister_netdev, while sysfs
callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
free its sysfs directory, and we will get  a->b, b->a deadlock.

    Trace like the following:

        schedule+0x37/0x80
        schedule_preempt_disabled+0xe/0x10
        __mutex_lock_slowpath+0xb5/0x120
        mutex_lock+0x23/0x40
        rtnl_lock+0x15/0x20
        netdev_run_todo+0x17c/0x320
        rtnl_unlock+0xe/0x10
        ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
        delete_child+0x54/0x80 [ib_ipoib]
        dev_attr_store+0x18/0x30
        sysfs_kf_write+0x37/0x40
        mutex_lock+0x16/0x40
        SyS_write+0x55/0xc0
        entry_SYSCALL_64_fastpath+0x16/0x75
    And
        schedule+0x37/0x80
        __kernfs_remove+0x1a8/0x260
        ? wake_atomic_t_function+0x60/0x60
        kernfs_remove+0x25/0x40
        sysfs_remove_dir+0x50/0x80
        kobject_del+0x18/0x50
        device_del+0x19f/0x260
        netdev_unregister_kobject+0x6a/0x80
        rollback_registered_many+0x1fd/0x340
        rollback_registered+0x3c/0x70
        unregister_netdevice_queue+0x55/0xc0
        unregister_netdev+0x20/0x30
        ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
        ib_unregister_client+0x4a/0x170 [ib_core]
        ? find_module_all+0x71/0xa0
        ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
        SyS_delete_module+0x1b5/0x210
        entry_SYSCALL_64_fastpath+0x16/0x75

The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
get out from the sysfs function.

Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Fixes: 9baa0b03 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

198b12f7

05 5月, 2016 1 次提交

treewide: replace dev->trans_start update with helper · 860e9538

由 Florian Westphal 提交于 5月 03, 2016

Replace all trans_start updates with netif_trans_update helper.
change was done via spatch:

struct net_device *d;
@@
- d->trans_start = jiffies
+ netif_trans_update(d)

Compile tested only.

Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: linux-xtensa@linux-xtensa.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: MPT-FusionLinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-can@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: linux-bluetooth@vger.kernel.org
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
Acked-by: NAntonio Quartulli <a@unstable.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

860e9538

03 3月, 2016 1 次提交

IB/ipoib: Add handling for sending of skb with many frags · 78a50a5e

由 Hans Westgaard Ry 提交于 3月 02, 2016

IPoIB converts skb-fragments to sge adding 1 extra sge when SG is enabled.
Current codepath assumes that the max number of sge a device support
is at least MAX_SKB_FRAGS+1, there is no interaction with upper layers
to limit number of fragments in an skb if a device suports fewer
sges. The assumptions also lead to requesting a fixed number of sge
when IPoIB creates queue-pairs with SG enabled.

A fallback/slowpath is implemented using skb_linearize to
handle cases where the conversion would result in more sges than supported.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78a50a5e

23 12月, 2015 1 次提交

IB/ulps: Avoid calling ib_query_device · 4a061b28

由 Or Gerlitz 提交于 12月 18, 2015

Instead, use the cached copy of the attributes present on the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4a061b28

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功