提交 · 1e9877902dc7e11d2be038371c6fbf2dfcd469d7 · openanolis / cloud-kernel

16 2月, 2016 1 次提交

mm/gup: Introduce get_user_pages_remote() · 1e987790

由 Dave Hansen 提交于 2月 12, 2016

For protection keys, we need to understand whether protections
should be enforced in software or not.  In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.

This patch introduces a new get_user_pages() variant:

        get_user_pages_remote()

Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.

We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.

The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address.  This
makes it a pretty unique gup caller.  Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.

Without protection keys, this patch should not change any behavior.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1e987790

13 2月, 2016 2 次提交

IB/mlx5: Fix RC transport send queue overhead computation · 75c1657e

由 Leon Romanovsky 提交于 2月 11, 2016

Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

The ATOMIC and UMR segments can't coexist together, so chose maximum out
of them.

The commit 9e65dc37 ("IB/mlx5: Fix RC transport send queue overhead
computation") was intended to update RC transport as commit messages
states, but added the code to UC transport.

Fixes: 9e65dc37 ("IB/mlx5: Fix RC transport send queue overhead computation")
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

75c1657e

IB/ipoib: fix for rare multicast join race condition · 08bc3276

由 Alex Estrin 提交于 2月 11, 2016

A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):

[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
 }
Proposed patch should prevent multicast join task to continue
if link state change is detected.
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>

Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

08bc3276

12 2月, 2016 1 次提交

IB/core: Fix reading capability mask of the port info class · ee50aeac

由 Eran Ben Elisha 提交于 2月 11, 2016

When checking specific attribute from a bit mask, need to use bitwise
AND and not logical AND, fixed that.

Fixes: 145d9c54 ('IB/core: Display extended counter set if
available')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ee50aeac

06 2月, 2016 4 次提交

RDMA/ocrdma: Fixing ocrdma debugfs directory remove · 7425f410

由 Selvin Xavier 提交于 2月 05, 2016

During the ocrdma device remove sequence, the debugfs directory
tree of each ocrdma device needs to be removed. Use
debugfs_remove_recursive instead of debugfs_remove.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7425f410

RDMA/ocrdma: Fix pkey_index returned by driver in rq work completion · aff3ead9

由 Selvin Xavier 提交于 2月 05, 2016

Currently returning the pkey value instead of pkey index.
pkey index is always zero since ocrdma supports only default
pkey.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aff3ead9

RDMA/ocrdma: populate max_sge_rd in device attributes · 7d82df16

由 Selvin Xavier 提交于 2月 05, 2016

max_sge_rd is used by some of the ULPs to calculate the maximum
number of SGEs that can be used for RDMA READ. Populating this
value in the response of query_device verb. Also, avoid checking
the max_srq_sge while populating max_sge.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7d82df16

RDMA/ocrdma: Initialize stats resources in the driver before ib device registration. · fd98d896

由 Selvin Xavier 提交于 2月 05, 2016

In the latest kernel, process_mad hook of the driver can be invoked as
soon as device is registered. In this hook, ocrdma driver is issuing a
command to get the stats counters from the HW. This is triggering system
crash since the statistics command resources are not allocated by the driver.
Changing the sequence of initialization to avoid this crash.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fd98d896

04 2月, 2016 2 次提交

IB/sysfs: remove unused va_list args · 9f780dab

由 Colin Ian King 提交于 1月 25, 2016

_show_port_gid_attr performs a va_end on some unused va_list args.
Clean this up by removing the args completely.

Fixes: 470be516 ("IB/core: Add gid attributes to sysfs")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9f780dab

IB/IPoIB: Do not set skb truesize since using one linearskb · bb6a7773

由 Carol L Soto 提交于 2月 03, 2016

We are seeing this warning: at net/core/skbuff.c:4174
and before commit a44878d1 ("IB/ipoib: Use one linear skb in RX flow")
skb truesize was not being set when ipoib was using just one skb.
Removing this line avoids the warning when running tcp tests like iperf.

Fixes: a44878d1 ("IB/ipoib: Use one linear skb in RX flow")
Signed-off-by: NCarol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bb6a7773

03 2月, 2016 5 次提交

IB/core: Set correct payload length for RoCEv2 over IPv6 · 1c5e0809

由 Moni Shoua 提交于 1月 28, 2016

For GSI QP traffic, the count of the udp header bytes was missing from
the IPv6 header, fix that.

Fixes: 25f40220 ('IB/core: Initialize UD header structure with IP
                     and UDP headers')
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1c5e0809

IB/mlx5: Use MLX5_GET to correctly get end of padding mode · 01581fb8

由 Maor Gottlieb 提交于 1月 28, 2016

MLX5_GET64 was used on end_padding_mode, which is a 2-bit field.
This is wrong as the calculated offset is incorrect. Using MLX5_GET
instead of MLX5_GET64 to fix that.

Fixes: 0fb2ed66 ('IB/mlx5: Add create and destroy functionality
                     for Raw Packet QP')
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

01581fb8

IB/mlx5: Fix use of null pointer PD · 09f16cf5

由 Majd Dibbiny 提交于 1月 28, 2016

When a Raw Ethernet QP is created, a NULL pointer PD could be used.
Fixing that by only using the PD after validating it's valid.
smatch also reported this error:
drivers/infiniband/hw/mlx5/qp.c:1629 mlx5_ib_create_qp()
	 error: we previously assumed 'pd' could be null (see line 1616)

Fixes: 0fb2ed66 ('IB/mlx5: Add create and destroy functionality for Raw Packet QP')
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

09f16cf5

IB/mlx5: Fix reqlen validation in mlx5_ib_alloc_ucontext · a168a41c

由 Majd Dibbiny 提交于 1月 28, 2016

Older libraries that don't have all the new req_v2 fields
should be able to work as well. Today, if the library uses v2, it
will fail to allocate context since the size of reqlen is smaller
than the req_v2 size.

Fix the validation to be with the original req_v2 size and not
the current.

Fixes: f72300c5 ('IB/mlx5: Expose CQE version to user-space')
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a168a41c

IB/mlx5: Add CREATE_CQ and CREATE_QP to uverbs_ex_cmd_mask · d4584ddf

由 Matan Barak 提交于 1月 28, 2016

The mlx5_ib driver supports the extended create_cq and create_qp user
verbs. In the current mechanism, a vendor supporting an extended uverb
should set the appropriate bit in the uverbs_ex_cmd_mask field.
Adding the actual support by setting the required bits in order to
support features like completion time-stamping and cross-channel.

Fixes: 972ecb82 ('IB/mlx5: Add create_cq extended command')
Fixes: ddf9529b ('IB/core: Allow setting create flags in QP init
                      attribute')
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d4584ddf

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

22 1月, 2016 12 次提交

IB/mlx5: Unify CQ create flags check · 34356f64

由 Leon Romanovsky 提交于 12月 29, 2015

The create_cq() can receive creation flags which were used
differently by two commits which added create_cq extended
command and cross-channel. The merged code caused to not
accept any flags at all.

This patch unifies the check into one function and one return
error code.

Fixes: 972ecb82 ("IB/mlx5: Add create_cq extended command")
Fixes: 051f2630 ("IB/mlx5: Add driver cross-channel support")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

34356f64

IB/mlx5: Expose Raw Packet QP to user space consumers · ad5f8e96

由 majd@mellanox.com 提交于 1月 14, 2016

Added Raw Packet QP modify functionality which will enable user
space consumers to use it.

Since Raw Packet QP is built of SQ and RQ sub-objects, therefore
Raw Packet QP state changes are implemented by changing the state
of the sub-objects.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ad5f8e96

{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib · 427c1e7b

由 majd@mellanox.com 提交于 1月 14, 2016

When modifying a QP, the desired operation was determined in
the mlx5_core using a transition table that takes the current
state, the final state, and returns the desired operation.

Since this logic will be used for Raw Packet QP, move the
operation table to the mlx5_ib.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

427c1e7b

IB/mlx5: Support setting Ethernet priority for Raw Packet QPs · 75850d0b

由 majd@mellanox.com 提交于 1月 14, 2016

When the user changes the Address Vector(AV) in the modify QP, he
provides an SL. This SL should be translated to Ethernet Priority
by taking the 3 LSB bits, and modify the QP's TIS according to this
Ethernet priority.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

75850d0b

IB/mlx5: Add Raw Packet QP query functionality · 6d2f89df

由 majd@mellanox.com 提交于 1月 14, 2016

Since Raw Packet QP is composed of RQ and SQ, the IB QP's
state is derived from the sub-objects. Therefore we need
to query each one of the sub-objects, and decide on the
IB QP's state.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6d2f89df

IB/mlx5: Add create and destroy functionality for Raw Packet QP · 0fb2ed66

由 majd@mellanox.com 提交于 1月 14, 2016

This patch adds support for Raw Packet QP for the mlx5 device.

Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp
object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object.

Since the SQ and RQ work-queue (WQ) buffers are not contiguous like
other QPs, we allocate separate buffers in the user-space and pass
the address of each one of them separately to the kernel.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0fb2ed66

IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types · 19098df2

由 majd@mellanox.com 提交于 1月 14, 2016

Extract specific IB QP fields to mlx5_ib_qp_trans structure.
The mlx5_core QP object resides in mlx5_ib_qp_base, which all QP types
inherit from. When we need to find mlx5_ib_qp using mlx5_core QP
(event handling and co), we use a pointer that resides in
mlx5_ib_qp_base.

In addition, we delete all redundant fields that weren't used anywhere
in the code:
-doorbell_qpn
-sq_max_wqes_per_wr
-sq_spare_wqes
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

19098df2

IB/mlx5: Allocate a Transport Domain for each ucontext · 146d2f1a

由 majd@mellanox.com 提交于 1月 14, 2016

Transport Domain groups several TIS and TIR object. By grouping
these object, it defines wheather local loopback packets that
are sent from the TIS objects in the group are received by the
TIR objects in the same group.

Allocate a Transport Domain(TD) for each user context to be used
in the future by Raw Packet QP for Self-Loopback Control.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

146d2f1a

IB/mlx5: Expose CQE version to user-space · f72300c5

由 Haggai Abramovsky 提交于 1月 14, 2016

Per user context, work with CQE version that both the user-space
and the kernel support. Report this CQE version via the response of
the alloc_ucontext command.
Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f72300c5

IB/mlx5: Add CQE version 1 support to user QPs and SRQs · cfb5e088

由 Haggai Abramovsky 提交于 1月 14, 2016

Enforce working with CQE version 1 when the user supports CQE
version 1 and asked to work this way.

If the user still works with CQE version 0, then use the default
CQE version to tell the Firmware that the user still works in the
older mode.

After this patch, the kernel still reports CQE version 0.
Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cfb5e088

IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext · dfbee859

由 Haggai Abramovsky 提交于 1月 14, 2016

The wrong buffer size was passed to ib_is_udata_cleared.
Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

dfbee859

IB/sa: Fix netlink local service GFP crash · 2deeb477

由 Kaike Wan 提交于 1月 21, 2016

The rdma netlink local service registers a handler to handle RESOLVE
response and another handler to handle SET_TIMEOUT request. The first
thing these handlers do is to call netlink_capable() to check the
access right of the received skb to make sure that the sender has root
access. Under normal conditions, such responses and requests will be
directly forwarded to the handlers without going through the netlink_dump
pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c).
However, a user application could send a RESOLVE request (not response)
to the local service, which will fall into the netlink_dump pathway,
where a new skb will be created without initializing the control block.
This new skb will be eventually forwarded to the local service RESOLVE
response handler. Unfortunately, netlink_capable() will cause general
protection fault if the skb's control block is not initialized. This
patch will address the problem by checking the skb first.
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2deeb477

20 1月, 2016 12 次提交

IB/srpt: Remove redundant wc array · f9a6ed62

由 Sagi Grimberg 提交于 1月 12, 2016

No usage after the conversion to the new CQ API.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f9a6ed62

IB/qib: Improve ipoib UD performance · 967bcfc0

由 Mike Marciniszyn 提交于 12月 24, 2015

Based on profiling, UD performance drops in case of processes
in a single client due to excess context switches when
the progress workqueue is scheduled.

This is solved by modifying the heuristic to select the
direct progress instead of the scheduling progress via
the workqueue when UD-like situations are detected in
the heuristic.
Reviewed-by: NVinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

967bcfc0

IB/mlx4: Advertise RoCE v2 support · 4ed088e6

由 Matan Barak 提交于 1月 14, 2016

Advertise RoCE v2 support in port_immutable attributes according to
the hardware's capabilities. This enables the verbs stack to use
RoCE v2 mode.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4ed088e6

IB/mlx4: Create and use another QP1 for RoCEv2 · e1b866c6

由 Moni Shoua 提交于 1月 14, 2016

The mlx4 driver uses a special QP to implement the GSI QP. This kind
of QP allows to build the InfiniBand headers in software.
When mlx4 hardware builds the packet, it calculates the ICRC and puts
it at the end of the payload. However, this ICRC calculation depends
on the QP configuration, which is determined when the QP is modified
(roce_mode during INIT->RTR).
When receiving a packet, the ICRC verification doesn't depend on this
configuration.
Therefore, using two GSI QPs for send (one for each RoCE version) and
one GSI QP for receive are required.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e1b866c6

IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers · 3ef967a4

由 Moni Shoua 提交于 1月 14, 2016

RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.

This patch adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3ef967a4

IB/mlx4: Enable RoCE v2 when the IB device is added · 71a39bbb

由 Moni Shoua 提交于 1月 14, 2016

If the hardware supports RoCE v2, we configure the hardware UDP
port according to the RoCE v2 Annex when mlx4_ib device is added.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

71a39bbb

IB/mlx4: Support modify_qp for RoCE v2 · 3b5daf28

由 Moni Shoua 提交于 1月 14, 2016

In order to support modify_qp for RoCE v2, we need to set
the gid_type in the QP context.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3b5daf28

IB/mlx4: Add support for setting RoCEv2 gids in hardware · 7e57b85c

由 Moni Shoua 提交于 1月 14, 2016

To tell hardware about a gid with type RoCEv2, software needs a new
modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
replace the old method, MLX4_SET_PORT_GID_TABLE, for  RoCEv1 gids.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7e57b85c

IB/mlx4: Add gid_type to GID properties · b699a859

由 Moni Shoua 提交于 1月 14, 2016

IB core driver adds a property of type to struct ib_gid_attr.
The mlx4 driver should take that in consideration when modifying or
querying the hardware gid table.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b699a859

IB/core: Use hop-limit from IP stack for RoCE · c3efe750

由 Matan Barak 提交于 1月 04, 2016

Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3efe750

IB/core: Rename rdma_addr_find_dmac_by_grh · f7f4b23e

由 Matan Barak 提交于 1月 04, 2016

rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f7f4b23e

IB/cm: Fix a recently introduced deadlock · 4bfdf635

由 Bart Van Assche 提交于 1月 01, 2016

ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:
Acked-by: NErez Shitrit <erezsh@mellanox.com>

=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G            E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
  [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
  [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
  [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
  [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
  [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
  [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
  [<ffffffff8107184d>] process_one_work+0x1bd/0x460
  [<ffffffff81073148>] worker_thread+0x118/0x420
  [<ffffffff81078454>] kthread+0xe4/0x100
  [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cm_id_priv->lock)->rlock);
  <Interrupt>
    lock(&(&cm_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
 [<ffffffff810a3995>] mark_lock+0x115/0x1b0
 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
 [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
 [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f3d>] handle_irq+0x5d/0x130
 [<ffffffff810071fd>] do_IRQ+0x6d/0x130
 [<ffffffff8151d309>] common_interrupt+0x89/0x89
 <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
 [<ffffffff810990d6>] call_cpuidle+0x36/0x60
 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
 [<ffffffff8103c443>] start_secondary+0x83/0x90

Fixes: commit be4b4993 ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4bfdf635

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功