提交 · 210b1f78076f88cad25b333fffafbac6ae870fcc · openeuler / Kernel

07 3月, 2018 7 次提交

IB/mlx5: When not in dual port RoCE mode, use provided port as native · 210b1f78

由 Mark Bloch 提交于 3月 05, 2018

The series that introduced dual port RoCE mode assumed that we don't have
a dual port HCA that use the mlx5 driver, this is not the case for
Connect-IB HCAs. This reasoning led to assigning 1 as the native port
index which causes issue when the second port is used.

For example query_pkey() when called on the second port will return values
of the first port. Make sure that we assign the right port index as the
native port index.

Fixes: 32f69e4b ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

210b1f78

IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE · a1817792

由 Jack M 提交于 3月 05, 2018

The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
to GID properties.

When adding GIDs, this gid_type field was copied over to the
hardware gid table. However, when deleting GIDs, the gid_type field
was not copied over to the hardware gid table.

As a result, when running RoCEv2, all RoCEv2 gids in the
hardware gid table were set to type RoCEv1 when any gid was deleted.

This problem would persist until the next gid was added (which would again
restore the gid_type field for all the gids in the hardware gid table).

Fix this by copying over the gid_type field to the hardware gid table
when deleting gids, so that the gid_type of all remaining gids is
preserved when a gid is deleted.

Fixes: b699a859 ("IB/mlx4: Add gid_type to GID properties")
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a1817792

IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs · 0077416a

由 Jack Morgenstein 提交于 3月 05, 2018

When using IPv4 addresses in RoCEv2, the GID format for the mapped
IPv4 address should be: ::ffff:<4-byte IPv4 address>.

In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
zeroed out by memset, which resulted in deleting the ffff field.

However, since procedure ipv6_addr_v4mapped() already verifies that the
gid has format ::ffff:<ipv4 address>, no change is needed for the gid,
and the memset can simply be removed.

Fixes: 7e57b85c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0077416a

RDMA/qedr: Fix iWARP write and send with immediate · 551e1c67

由 Kalderon, Michal 提交于 3月 05, 2018

iWARP does not support RDMA WRITE or SEND with immediate data.
Driver should check this before submitting to FW and return an
immediate error
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

551e1c67

RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA · e3fd112c

由 Kalderon, Michal 提交于 3月 05, 2018

Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
leading to a case where two context's accessing poll_cq at
the same time lead to one of them having a pointer to an old
latest_cqe and reading an invalid cqe element
Signed-off-by: NAmit Radzi <Amit.Radzi@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e3fd112c

RDMA/qedr: Fix iWARP connect with port mapper · ea0ed478

由 Kalderon, Michal 提交于 3月 05, 2018

Fix iWARP connect and listen to use the mapped port for
ipv4 and ipv6. Without this fixed, running on a server
that has iwpmd enabled will not use the correct port
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ea0ed478

RDMA/qedr: Fix ipv6 destination address resolution · 11052696

由 Kalderon, Michal 提交于 3月 05, 2018

The wrong parameter was passed to dst_neigh_lookup
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

11052696

01 3月, 2018 7 次提交

RDMA/bnxt_re: Fix the ib_reg failure cleanup · 497158aa

由 Selvin Xavier 提交于 2月 26, 2018

Release the netdev references in the cleanup path.  Invokes the cleanup
routines if bnxt_re_ib_reg fails.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

497158aa

RDMA/bnxt_re: Fix incorrect DB offset calculation · c354dff0

由 Devesh Sharma 提交于 2月 26, 2018

To support host systems with non 4K page size, l2_db_size shall be
calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
to FW during initialization.
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c354dff0

RDMA/bnxt_re: Unconditionly fence non wire memory operations · a45bc17b

由 Devesh Sharma 提交于 2月 26, 2018

HW requires an unconditonal fence for all non-wire memory operations
through SQ. This guarantees the completions of these memory operations.
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a45bc17b

IB/mlx: Set slid to zero in Ethernet completion struct · 65389322

由 Moni Shoua 提交于 2月 25, 2018

IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16()  validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.

Fixes: 62ede777 ("Add OPA extended LID support")
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

65389322

{net, IB}/mlx5: Raise fatal IB event when sys error occurs · aba46213

由 Daniel Jurgens 提交于 2月 25, 2018

All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Fixes: 89d44f0a("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aba46213

IB/mlx5: Avoid passing an invalid QP type to firmware · e7b169f3

由 Noa Osherovich 提交于 2月 25, 2018

During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.

Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.

Fixes: 09a7d9ec ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e7b169f3

IB/mlx5: Fix incorrect size of klms in the memory region · da343b6d

由 Sergey Gorenko 提交于 2月 25, 2018

The value of mr->ndescs greater than mr->max_descs is set in the
function mlx5_ib_sg_to_klms() if sg_nents is greater than
mr->max_descs. This is an invalid value and it causes the
following error when registering mr:

mlx5_0:dump_cqe:276:(pid 193): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3

Cc: <stable@vger.kernel.org> # 4.5
Fixes: b005d316 ("mlx5: Add arbitrary sg list support")
Signed-off-by: NSergey Gorenko <sergeygo@mellanox.com>
Tested-by: NLaurence Oberman <loberman@redhat.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

da343b6d

21 2月, 2018 5 次提交

RDMA/bnxt_re: Avoid system hang during device un-reg · 7374fbd9

由 Selvin Xavier 提交于 2月 15, 2018

BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work
requests posted together. Track schedule of multiple
workqueue items by maintaining a per device counter
and proceed with IB dereg only if this counter is zero.
flush_workqueue is no longer required from
NETDEV_UNREGISTER path.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7374fbd9

RDMA/bnxt_re: Fix system crash during load/unload · dcdaba08

由 Selvin Xavier 提交于 2月 15, 2018

During driver unload, the driver proceeds with cleanup
without waiting for the scheduled events. So the device
pointers get freed up and driver crashes when the events
are scheduled later.

Flush the bnxt_re_task work queue before starting
device removal.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

dcdaba08

RDMA/bnxt_re: Synchronize destroy_qp with poll_cq · 3b921e3b

由 Selvin Xavier 提交于 2月 15, 2018

Avoid system crash when destroy_qp is invoked while
the driver is processing the poll_cq. Synchronize these
functions using the cq_lock.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3b921e3b

RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails · 6b4521f5

由 Devesh Sharma 提交于 2月 15, 2018

Driver leaves the QP memory pinned if QP create command
fails from the FW. Avoids this scenario by adding a proper
exit path if the FW command fails.
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6b4521f5

RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters · 7ff662b7

由 Devesh Sharma 提交于 2月 15, 2018

More testing needs to be done before enabling this feature.
Disabling the feature temporarily
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7ff662b7

16 2月, 2018 1 次提交

RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file · 1f5a6c47

由 Adit Ranadive 提交于 2月 15, 2018

This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.

Fixes: 8b10ba78 ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eb ("IB: Add vmw_pvrdma driver")
Suggested-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NBryan Tan <bryantan@vmware.com>
Reviewed-by: NAditya Sarwade <asarwade@vmware.com>
Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
Signed-off-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1f5a6c47

12 2月, 2018 1 次提交

vfs: do bulk POLL* -> EPOLL* replacement · a9a08845

由 Linus Torvalds 提交于 2月 11, 2018

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9a08845

05 2月, 2018 1 次提交

RDMA/hns: Fix the endian problem for hns · 8b9b8d14

由 oulijun 提交于 2月 05, 2018

The hip06 and hip08 run on a little endian ARM, it needs to
revise the annotations to indicate that the HW uses little
endian data in the various DMA buffers, and flow the necessary
swaps throughout.

The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu
swaps are no-op for this, which makes the only substantive
change the handling of imm_data which is now mandatory swapped.

This also keep match with the userspace hns driver and resolve
the warning by sparse.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8b9b8d14

02 2月, 2018 18 次提交

IB/hfi1: Add 16B rcvhdr trace support · 6197a815

由 Don Hiatt 提交于 2月 01, 2018

Add trace_hfi1_rcvhdr support for bypass packets.
While here, remove the etype argument as it is available
in struct hfi1_packet.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6197a815

IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node · 953a9ceb

由 Kamenee Arumugam 提交于 2月 01, 2018

Kzalloc_node API doesn't check for overflows in size multiplication.
While kcalloc API check for overflows in size multiplication
but these implementations are not NUMA-aware.

This conversion allowed for correcting an allocation used in the hot
path to be on the local NUMA and ensure us overflow free multiplication
for the size of a memory allocation.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

953a9ceb

IB/hfi1: Show fault stats in both TX and RX directions · b5de809e

由 Mitko Haralanov 提交于 2月 01, 2018

The routine which shows the fault stats checks the counters
to determine whether to show any stats based on the number of
transmitted pkts/bytes for a particular opcode.

Unfortunately, it only checked the receive counters. As a result,
if any packet faults have happened for packets egressing the HFI,
those stats would not be shown.

In order to fix this, the routine is amended to also check the
TX counters. With this change the pkt/byte counts are the sum of
both TX and RX counts for the opcode.

Fixes: 1b311f89 ("IB/hfi1: Add tx_opcode_stats like the opcode_stats")
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b5de809e

IB/hfi1: Remove blind constants from 16B update · 78d3633b

由 Mike Marciniszyn 提交于 2月 01, 2018

These values were introduced as part of the 16B code to
account for the varying size of the LRH between the differing
packet formats.

Replace the blind constants with defines based on FIELD_SIZEOF()
calls.

Fixes: 5b6cabb0 ("IB/hfi1: Add 16B RC/UC support")
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

78d3633b

IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times · 07190076

由 Kamenee Arumugam 提交于 2月 01, 2018

HFI's counters SendWaitCnt and SendWaitVlCnt are in units
of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
PortVLXmtWait are in units of flit times.
Convert the counter values to flit units using following
conversion formula:

PortXmitWait =
	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
PortVLXmitWait =
	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)

At link up or downgrade events, the link width can change. To ensure
accurate counter calculations, sample the counters after the events,
during counter requests, and then aggregate the OPA counters.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

07190076

IB/hfi1: Do not override given pcie_pset value · 6391214f

由 Bartlomiej Dudek 提交于 2月 01, 2018

During PCIe Gen 3 transistion, pcie_pset is read and might be overridden
to a default value(i.e. 255) in do_pcie_gen3_transition() routine.

If the pcie_pset value is overridden then this new value will be used
during initialization of next adapter on a different card.

Introducing a new local variable to avoid modification of pcie_pset
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: NPatel Jay P <jay.p.patel@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6391214f

IB/hfi1: Optimize process_receive_ib() · aca7f4fc

由 Sebastian Sanchez 提交于 2月 01, 2018

The arguments for trace_hfi1_rcvhdr() get computed every
time in the hot path regardless of the whether the trace
is on or off. This is seen to be costly with a profile.
The handling of fault inject isolates the verbs device for
all packets regardless of the presence of a RHF_DC_ERR error.

Fix the first by computing trace_hfi1_rcvhdr() arguments within
the trace itself, so that when the trace is off, the argument
data isn't computed. Fix the second by moving the error check to
handle_eflags() when an RHF error occurs and by testing for
RHF_DC_ERR before executing the reset of handle_eflags().
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aca7f4fc

IB/hfi1: Remove unnecessary fecn and becn fields · ca85bb1c

由 Sebastian Sanchez 提交于 2月 01, 2018

packet->fecn and packet->becn are calculated in the hot path
and are never used. Remove these fields as they show to be
costly in a profile. Also, remove initialization for
becn and fecn in process_ecn() as they're unconditionally
assigned in the function and ensure fecn and becn variables
use a boolean type.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ca85bb1c

IB/hfi1: Look up ibport using a pointer in receive path · bdaf96f6

由 Sebastian Sanchez 提交于 2月 01, 2018

In the receive path, hfi1_ibport is looked up by indexing into an
array. A profile shows this to be expensive. The receive context
data has a pointer to the ibport data, use that pointer instead.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bdaf96f6

IB/hfi1: Optimize packet type comparison using 9B and bypass code paths · 6d6b8848

由 Sebastian Sanchez 提交于 2月 01, 2018

The packet type comparison used to find out if a packet is a bypass
packet in the hot path is an expensive operation as seen in a profile.

Determine packet's pkey and migration bit through the bypass and 9B
code paths instead.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6d6b8848

IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet · f150e273

由 Sebastian Sanchez 提交于 2月 01, 2018

In hfi1_rc_rcv(), BTH is computed for all packets received.
However, it's only used for packets received with opcodes
RDMA_WRITE_LAST and SEND_LAST, and it is a costly operation.

Compute BTH only in the RDMA_WRITE_LAST/SEND_LAST code path
and let the compiler handle endianness conversion for bitwise
operations.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f150e273

IB/hfi1: Remove dependence on qp->s_hdrwords · 9636258f

由 Mitko Haralanov 提交于 2月 01, 2018

The s_hdrwords variable was used to indicate whether a
packet was already built on a previous iteration of the
send engine. This variable assumed the protection of the
QP's RVT_S_BUSY flag, which was required since the the
QP's s_lock was dropped just prior to the packet being
queued on the one of the egress mechanisms.

Support for multiple send engine instantiations require
that the field not be used due to concurency issues.
The ps.txreq signals the "already built" without the
potential concurency issues.

Fix by getting rid of all s_hdrword usage.   A wrapper
is added to test for the already built case that used to
use s_hdrwords.

What used to be stored in s_hdrwords is now in the txreq.
The PBC is not counted, but is added in the pio/sdma code
paths prior to posting the packet.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9636258f

IB/hfi1: Fix for potential refcount leak in hfi1_open_file() · 2b1e7fe1

由 Alex Estrin 提交于 2月 01, 2018

The dd refcount is speculatively incremented prior to allocating
the fd memory with kzalloc(). If that kzalloc() failed the dd
refcount leaks.
Increment refcount on kzalloc success.

Fixes: e11ffbd5 ("IB/hfi1: Do not free hfi1 cdev parent structure early")
Reviewed-by: NMichael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2b1e7fe1

IB/hfi1: Fix for early release of sdma context · 473291b3

由 Alex Estrin 提交于 2月 01, 2018

With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled
module removal may result in panic in sdma_interrupt() routine
if associated sdma context was released before pci_free_irq();

[ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1]
[ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0
[ 9198.941783] Oops: 0000 [#1] SMP
.....
[ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G           OE   4.14.0-rc4+ #1
[ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017
[ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000
[ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1]
[ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046
[ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000
[ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0
[ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff
[ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44
[ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00
[ 9198.983462] FS:  00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000
[ 9198.986231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0
[ 9198.991911] Call Trace:
[ 9198.994847]  sdma_engine_interrupt+0x82/0x100 [hfi1]
[ 9198.997852]  sdma_interrupt+0x61/0xc0 [hfi1]
[ 9199.000852]  __free_irq+0x1b3/0x2d0
[ 9199.003873]  free_irq+0x35/0x70
[ 9199.006909]  pci_free_irq+0x1c/0x30
[ 9199.009999]  clean_up_interrupts+0x53/0xf0 [hfi1]
[ 9199.013137]  hfi1_start_cleanup+0x117/0x190 [hfi1]
[ 9199.016315]  postinit_cleanup+0x1d/0x270 [hfi1]
[ 9199.019529]  remove_one+0x1f3/0x210 [hfi1]
[ 9199.022738]  pci_device_remove+0x39/0xc0
[ 9199.025974]  device_release_driver_internal+0x141/0x210
[ 9199.029268]  driver_detach+0x3f/0x80
[ 9199.032580]  bus_remove_driver+0x55/0xd0
[ 9199.035931]  driver_unregister+0x2c/0x50
[ 9199.039321]  pci_unregister_driver+0x2a/0xa0
[ 9199.042755]  hfi1_mod_cleanup+0x10/0xb50 [hfi1]
[ 9199.046196]  SyS_delete_module+0x171/0x250
...

Fix by exporting sdma_clean() and removing from sdma_exit().
sdma_exit() now just manipulates the engine state,
leaving the memory free to sdma_clean() which is now called
just before the dd is freed.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NMichael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

473291b3

IB/hfi1: Re-order IRQ cleanup to address driver cleanup race · 82a97926

由 Michael J. Ruhl 提交于 2月 01, 2018

The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
all IRQ requests.

When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
IRQF_SHARED bit is set, a call to the IRQ handler is made from the
__free_irq() function. This is testing a race condition between the
IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
able to handle this race, but does not.

This race can cause traces that start with this footprint:

BUG: unable to handle kernel NULL pointer dereference at   (null)
Call Trace:
 <hfi1 irq handler>
 ...
 __free_irq+0x1b3/0x2d0
 free_irq+0x35/0x70
 pci_free_irq+0x1c/0x30
 clean_up_interrupts+0x53/0xf0 [hfi1]
 hfi1_start_cleanup+0x122/0x190 [hfi1]
 postinit_cleanup+0x1d/0x280 [hfi1]
 remove_one+0x233/0x250 [hfi1]
 pci_device_remove+0x39/0xc0

Export IRQ cleanup function so it can be called from other modules.

Using the exported cleanup function:

  Re-order the driver cleanup code to clean up IRQ resources before
  other resources, eliminating the race.

  Re-order error path for init so that the race does not occur.

Reduce severity on spurious error message for SDMA IRQs to info.
Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
Reviewed-by: NPatel Jay P <jay.p.patel@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

82a97926

RDMA/hns: Fix misplaced call to hns_roce_cleanup_hem_table · 0da65503

由 oulijun 提交于 1月 30, 2018

The mtt_table is cleaned up during the err_unmap_cqe label, it is a
mistake to duplicate the cleanup during the later unwind labels.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0da65503

RDMA/hns: Add names to function arguments in function pointers · fd012f1c

由 oulijun 提交于 1月 30, 2018

This patch mainly fix some style warings matched with the new checkpatch
requirement. The warning as follows:

WARNING: function definition argument 'struct hns_roce_cq *' should also have
an identifier name
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd012f1c

RDMA/hns: Remove unnecessary operator · c2799119

由 oulijun 提交于 1月 30, 2018

The double not-operator is unncessary when used in a boolean context. This
patch removes them.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c2799119

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功