提交 · 39b09a1a121cb22820c374f4e92f7ca34be1b75d · openanolis / cloud-kernel

20 1月, 2016 38 次提交

svcrdma: Add gfp flags to svc_rdma_post_recv() · 39b09a1a

由 Chuck Lever 提交于 1月 07, 2016

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39b09a1a

svcrdma: Remove unused req_map and ctxt kmem_caches · 71810ef3

由 Chuck Lever 提交于 1月 07, 2016

Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

71810ef3

svcrdma: Improve allocation of struct svc_rdma_req_map · 2fe81b23

由 Chuck Lever 提交于 1月 07, 2016

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2fe81b23

svcrdma: Improve allocation of struct svc_rdma_op_ctxt · cc886c9f

由 Chuck Lever 提交于 1月 07, 2016

When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cc886c9f

svcrdma: Clean up process_context() · ced4ac0c

由 Chuck Lever 提交于 1月 07, 2016

Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ced4ac0c

svcrdma: Clean up rdma_create_xprt() · 3d61677c

由 Chuck Lever 提交于 1月 07, 2016

kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3d61677c

IB/core: Use hop-limit from IP stack for RoCE · c3efe750

由 Matan Barak 提交于 1月 04, 2016

Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3efe750

IB/core: Rename rdma_addr_find_dmac_by_grh · f7f4b23e

由 Matan Barak 提交于 1月 04, 2016

rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f7f4b23e

IB/cm: Fix a recently introduced deadlock · 4bfdf635

由 Bart Van Assche 提交于 1月 01, 2016

ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:
Acked-by: NErez Shitrit <erezsh@mellanox.com>

=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G            E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
  [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
  [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
  [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
  [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
  [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
  [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
  [<ffffffff8107184d>] process_one_work+0x1bd/0x460
  [<ffffffff81073148>] worker_thread+0x118/0x420
  [<ffffffff81078454>] kthread+0xe4/0x100
  [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cm_id_priv->lock)->rlock);
  <Interrupt>
    lock(&(&cm_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
 [<ffffffff810a3995>] mark_lock+0x115/0x1b0
 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
 [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
 [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f3d>] handle_irq+0x5d/0x130
 [<ffffffff810071fd>] do_IRQ+0x6d/0x130
 [<ffffffff8151d309>] common_interrupt+0x89/0x89
 <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
 [<ffffffff810990d6>] call_cpuidle+0x36/0x60
 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
 [<ffffffff8103c443>] start_secondary+0x83/0x90

Fixes: commit be4b4993 ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4bfdf635

IB/srpt: Fix the RDMA completion handlers · 19f57298

由 Bart Van Assche 提交于 12月 31, 2015

Avoid that the following kernel crash is triggered when processing
an RDMA completion:

BUG: unable to handle kernel paging request at 0000000100000198
IP: [<ffffffff810a4ea2>] __lock_acquire+0xa2/0x560
Call Trace:
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt]
 [<ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core]
 [<ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core]
 [<ffffffff8107184d>] process_one_work+0x1bd/0x460
 [<ffffffff81073148>] worker_thread+0x118/0x420
 [<ffffffff81078454>] kthread+0xe4/0x100
 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70

Fixes: commit 59fae4de ("IB/srpt: chain RDMA READ/WRITE requests").
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

19f57298

irq_poll: Fix irq_poll_sched() · 2ee177e9

由 Bart Van Assche 提交于 12月 31, 2015

The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing.
This means that irq_poll_sched() must proceed if this bit has
not yet been set.

Fixes: commit ea51190c ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched").
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2ee177e9

IB/core: Fix dereference before check · 9506902b

由 Matan Barak 提交于 12月 30, 2015

Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.

Fixes: 20029832 ('IB/core: Validate route when we init ah')
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9506902b

IB/core: Eliminate sparse false context imbalance warning · 2e2cdace

由 Matan Barak 提交于 12月 30, 2015

When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.

This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.

Fixes: 9c584f04 ('IB/core: Change per-entry lock in RoCE GID table to
		     one lock')
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2e2cdace

IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling · 6e2a51a0

由 Hal Rosenstock 提交于 12月 29, 2015

Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.

To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.

This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>

Fixes: 145d9c54 ('IB/core: Display extended counter set if available')
Signed-off-by: NHal Rosenstock <hal@mellanox.com>
Acked-by: NMatan Barak <matanb@mellanox.com>
Acked-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6e2a51a0

IB/core: Remove set-but-not-used variable from ib_sg_to_pages() · b6aeb980

由 Bart Van Assche 提交于 12月 29, 2015

Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit 8f5ba10e).
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NLeon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b6aeb980

IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce · c876a1b7

由 Leon Romanovsky 提交于 1月 09, 2016

Fix static checker warning:
        drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce()
        warn: passing casted pointer '&props->qkey_viol_cntr' to
	'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16.

Fixes: 3f89a643 ("IB/mlx5: Extend query_device/port to support RoCE")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c876a1b7

IB/mad: use CQ abstraction · d53e11fd

由 Christoph Hellwig 提交于 1月 05, 2016

Remove the local workqueue to process mad completions and use the CQ API
instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d53e11fd

IB/mad: pass ib_mad_send_buf explicitly to the recv_handler · ca281265

由 Christoph Hellwig 提交于 1月 04, 2016

Stop abusing wr_id and just pass the parameter explicitly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ca281265

infiniband: Replace memset with eth_zero_addr · 39f42655

由 Lucas Tanure 提交于 1月 19, 2016

Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.
Signed-off-by: NLucas Tanure <tanure@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39f42655

IB/mlx5: Delete locally redefined variable · 50ca6ed2

由 Leon Romanovsky 提交于 1月 19, 2016

Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here

Fixes: d69e3bcf ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50ca6ed2

net/mlx4: Remove unused macro · f25bf197

由 Moni Shoua 提交于 1月 14, 2016

The macro mlx4_foreach_non_ib_transport_port() is not used anywhere. Remove it.

Fixes: aa9a2d51 ("mlx4: Activate RoCE/SRIOV")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f25bf197

IB/mlx4: Take source mac from AH instead from the port · 1049f138

由 Moni Shoua 提交于 1月 14, 2016

In commit dbf727de ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1049f138

IB/mlx4: Initialize hop_limit when creating address handle · 4e408167

由 Matan Barak 提交于 1月 14, 2016

Hop limit value wasn't copied from attributes  when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.

Fixes: fa417f7b ("IB/mlx4: Add support for IBoE")
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4e408167

IB/mlx5: Expose correct maximum number of CQE capacity · 9f177686

由 Leon Romanovsky 提交于 1月 14, 2016

Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.

Fixes: 938fe83c ("net/mlx5_core: New device capabilities handling")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9f177686

iw_cxgb4: Take clip reference before starting IPv6 listen · 28de1f74

由 Hariprasad S 提交于 1月 13, 2016

The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

28de1f74

iw_cxgb4: Fixes GW-Basic labels to meaningful error names · 4275a5b2

由 Hariprasad S 提交于 1月 12, 2016

Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4275a5b2

iw_cxgb4: Fixes static checker warning in c4iw_rdev_open() · 82b1df1b

由 Hariprasad S 提交于 1月 12, 2016

Commit c5dfb000 ("iw_cxgb4: Pass qid range to user space driver")
from Dec 11, 2015, leads to the following static checker warning:

	drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open()
        warn: variable dereferenced before check 'rdev->status_page'

Also we weren't deallocating ocqp pool in error path when failed to
allocate status page. Fixing it too.

Fixes: c5dfb000 ("iw_cxgb4: Pass qid range to user space driver")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

82b1df1b

IB/cma: allocating too much memory in make_cma_ports() · a7d0e959

由 Dan Carpenter 提交于 1月 12, 2016

The issue here is that there is a cut and paste bug.  When we allocate
cma_dev_group->default_ports_group we use "sizeof(*cma_dev_group->ports)"
instead of "sizeof(*cma_dev_group->default_ports_group)".

We're bumping up against the 80 character limit so I introduced a new
local pointer "ports_group" to get around that.

Fixes: 045959db ('IB/cma: Add configfs for rdma_cm')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a7d0e959

RDMA/nes: checking for NULL instead of IS_ERR · bc1251e6

由 Dan Carpenter 提交于 1月 12, 2016

nes_reg_phys_mr() returns ERR_PTRs on error.  It doesn't return NULL.

This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bc1251e6

IB/qib: Support creating qps with GFP_NOIO flag · fbbeb863

由 Vinit Agnihotri 提交于 1月 11, 2016

The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.

This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.

Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NVinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fbbeb863

IB/sysfs: Fix sparse warning on attr_id · 65487fdc

由 Ira Weiny 提交于 1月 03, 2016

Attributed ID was declared as an int while the value should really be big
endian 16.

Fixes: 35c4cbb1 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

65487fdc

RDMA/be2net: Remove open and close entry points · 9781808c

由 Devesh Sharma 提交于 12月 24, 2015

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock

B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.
Reported-by: NDoug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9781808c

RDMA/ocrdma: Depend on async link events from CNA · 3b1ea430

由 Devesh Sharma 提交于 12月 24, 2015

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.
Reported-by: NDoug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3b1ea430

RDMA/ocrdma: Dispatch only port event when port state changes · d310a344

由 Devesh Sharma 提交于 12月 24, 2015

Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.
Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d310a344

RDMA/ocrdma: Fix vlan-id assignment in qp parameters · a2addf94

由 Devesh Sharma 提交于 12月 24, 2015

vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a2addf94

IB/cma: Fix RDMA port validation for iWarp · 64936773

由 Matan Barak 提交于 1月 07, 2016

cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.

Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes: abae1b71 ('IB/cma: cma_validate_port should verify the port
		     and netdevice')
Reported-by: NHariprasad Shenai <hariprasad@chelsio.com>
Tested-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

64936773

IB/qib: fix mcast detach when qp not attached · 09dc9cd6

由 Mike Marciniszyn 提交于 1月 07, 2016

The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Cc: <stable@vger.kernel.org>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

09dc9cd6

IB/IPoIB: Fix kernel panic on multicast flow · 50be28de

由 Erez Shitrit 提交于 1月 07, 2016

ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.

This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS: 00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
	process_one_work+0x164/0x470
	worker_thread+0x11d/0x420
	...

Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Reported-by: NDoron Tsur <doront@mellanox.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50be28de

27 12月, 2015 1 次提交

IB/iser: Support the remote invalidation exception · 59caaed7

由 Jenny Derzhavetz 提交于 12月 24, 2015

Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory

Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.
Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

59caaed7

24 12月, 2015 1 次提交

IB/iser: Change the increment rkey flow logic · e26d2d21

由 Sagi Grimberg 提交于 12月 09, 2015

When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys.  So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e26d2d21

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功