提交 · ad18e20ba2887e221e903d311f4c9a1586eacffb · openanolis / cloud-kernel

09 5月, 2018 12 次提交

RDMA/hns: Update convert function of endian format · ad18e20b

由 oulijun 提交于 5月 04, 2018

Because the sys_image_guid of ib_device_attr structure is __be64, it
need to use cpu_to_be64 for converting.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ad18e20b

RDMA/hns: Load the RoCE dirver automatically · f97a62c3

由 oulijun 提交于 5月 04, 2018

To enable the linux-kernel system to load the hns-roce-hw-v2 driver
automatically when hns-roce-hw-v2 is plugged in pci bus, it need to
create a MODULE_DEVICE_TABLE for expose the pci_table of
hns-roce-hw-v2 to user.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Reported-by: NZhou Wang <wangzhou1@hisilicon.com>
Tested-by: NXiaojun Tan <tanxiaojun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f97a62c3

RDMA/hns: Bugfix for rq record db for kernel · 3a39bbec

由 oulijun 提交于 5月 04, 2018

When used rq record db for kernel, it needs to set the rdb_en of
hr_qp to 1 and configures the dma address of record rq db of qp
context.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3a39bbec

RDMA/hns: Add rq inline flags judgement · ecaaf1e2

由 oulijun 提交于 5月 04, 2018

It needs to set the rqie field of qp context by configured rq inline
flags. Besides, it need to decide whether posting inline rqwqe by
judged rq inline flags.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ecaaf1e2

nvmet,rxe: defer ip datagram sending to tasklet · 1661d3b0

由 Alexandru Moise 提交于 5月 08, 2018

This addresses 3 separate problems:

1. When using NVME over Fabrics we may end up sending IP
packets in interrupt context, we should defer this work
to a tasklet.

[   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
[   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
[   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
[   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
[   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
[   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
[   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
[   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
[   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
[   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
[   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
[   50.966121] Call Trace:
[   50.966845]  <IRQ>
[   50.967497]  __dev_queue_xmit+0x62d/0x690
[   50.968722]  dev_queue_xmit+0x10/0x20
[   50.969894]  neigh_resolve_output+0x173/0x190
[   50.971244]  ip_finish_output2+0x2b8/0x370
[   50.972527]  ip_finish_output+0x1d2/0x220
[   50.973785]  ? ip_finish_output+0x1d2/0x220
[   50.975010]  ip_output+0xd4/0x100
[   50.975903]  ip_local_out+0x3b/0x50
[   50.976823]  rxe_send+0x74/0x120
[   50.977702]  rxe_requester+0xe3b/0x10b0
[   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
[   50.980260]  rxe_do_task+0x85/0x100
[   50.981386]  rxe_run_task+0x2f/0x40
[   50.982470]  rxe_post_send+0x51a/0x550
[   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
[   50.985024]  __nvmet_req_complete+0x95/0xa0
[   50.986287]  nvmet_req_complete+0x15/0x60
[   50.987469]  nvmet_bio_done+0x2d/0x40
[   50.988564]  bio_endio+0x12c/0x140
[   50.989654]  blk_update_request+0x185/0x2a0
[   50.990947]  blk_mq_end_request+0x1e/0x80
[   50.991997]  nvme_complete_rq+0x1cc/0x1e0
[   50.993171]  nvme_pci_complete_rq+0x117/0x120
[   50.994355]  __blk_mq_complete_request+0x15e/0x180
[   50.995988]  blk_mq_complete_request+0x6f/0xa0
[   50.997304]  nvme_process_cq+0xe0/0x1b0
[   50.998494]  nvme_irq+0x28/0x50
[   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
[   51.000986]  handle_irq_event_percpu+0x32/0x80
[   51.002356]  handle_irq_event+0x3c/0x60
[   51.003463]  handle_edge_irq+0x1c9/0x200
[   51.004473]  handle_irq+0x23/0x30
[   51.005363]  do_IRQ+0x46/0xd0
[   51.006182]  common_interrupt+0xf/0xf
[   51.007129]  </IRQ>

2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
when using NVMEoF in order to solve lock ordering between neigh->ha_lock
seqlock and the nvme queue lock:

[   77.833783]  Possible interrupt unsafe locking scenario:
[   77.833783]
[   77.835831]        CPU0                    CPU1
[   77.837129]        ----                    ----
[   77.838313]   lock(&(&n->ha_lock)->seqcount);
[   77.839550]                                local_irq_disable();
[   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
[   77.843222]                                lock(&(&n->ha_lock)->seqcount);
[   77.845178]   <Interrupt>
[   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
[   77.847986]
[   77.847986]  *** DEADLOCK ***

3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:

[   47.634271]  Possible interrupt unsafe locking scenario:
[   47.634271]
[   47.636452]        CPU0                    CPU1
[   47.637861]        ----                    ----
[   47.639285]   lock(&(&sch->q.lock)->rlock);
[   47.640654]                                local_irq_disable();
[   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
[   47.644521]                                lock(&(&sch->q.lock)->rlock);
[   47.646480]   <Interrupt>
[   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
[   47.648492]
[   47.648492]  *** DEADLOCK ***

Using NVMEoF after this patch seems to finally be stable, without it,
rxe eventually deadlocks the whole system and causes RCU stalls.
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1661d3b0

i40iw: Use correct address in dst_neigh_lookup for IPv6 · eeb1af4f

由 Mustafa Ismail 提交于 5月 07, 2018

Use of incorrect structure address for IPv6 neighbor lookup
causes connections to IPv6 addresses to fail. Fix this by
using correct address in call to dst_neigh_lookup.

Fixes: f27b4746 ("i40iw: add connection management code")
Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

eeb1af4f

i40iw: Fix memory leak in error path of create QP · 5a7189d5

由 Mustafa Ismail 提交于 5月 07, 2018

If i40iw_allocate_dma_mem fails when creating a QP, the
memory allocated for the QP structure using kzalloc is not
freed because iwqp->allocated_buffer is used to free the
memory and it is not setup until later. Fix this by setting
iwqp->allocated_buffer before allocating the dma memory.

Fixes: d3749841 ("i40iw: add files for iwarp interface")
Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5a7189d5

RDMA/mlx5: Use proper spec flow label type · 37da2a03

由 Daria Velikovsky 提交于 5月 07, 2018

Flow label is defined as u32 in the in ipv6 flow spec, but
used internally in the flow specs parsing as u8. That was
causing loss of part of flow_label value.

Fixes: 2d1e697e ('IB/mlx5: Add support to match inner packet fields')
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NDaria Velikovsky <daria@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

37da2a03

RDMA/mlx5: Don't assume that medium blueFlame register exists · 18b0362e

由 Yishai Hadas 提交于 5月 07, 2018

User can leave system without medium BlueFlames registers,
however the code assumed that at least one such register exists.

This patch fixes that assumption.

Fixes: c1be5232 ("IB/mlx5: Fix micro UAR allocator")
Reported-by: NRohit Zambre <rzambre@uci.edu>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

18b0362e

IB/hfi1: Use after free race condition in send context error path · f9e76ca3

由 Michael J. Ruhl 提交于 5月 02, 2018

A pio send egress error can occur when the PSM library attempts to
to send a bad packet.  That issue is still being investigated.

The pio error interrupt handler then attempts to progress the recovery
of the errored pio send context.

Code inspection reveals that the handling lacks the necessary locking
if that recovery interleaves with a PSM close of the "context" object
contains the pio send context.

The lack of the locking can cause the recovery to access the already
freed pio send context object and incorrectly deduce that the pio
send context is actually a kernel pio send context as shown by the
NULL deref stack below:

[<ffffffff8143d78c>] _dev_info+0x6c/0x90
[<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
[<ffffffff816ab124>] ? __schedule+0x424/0x9b0
[<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
[<ffffffff810aa3ba>] process_one_work+0x17a/0x440
[<ffffffff810ab086>] worker_thread+0x126/0x3c0
[<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffff810b252f>] kthread+0xcf/0xe0
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[<ffffffff816b8798>] ret_from_fork+0x58/0x90
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40

This is the best case scenario and other scenarios can corrupt the
already freed memory.

Fix by adding the necessary locking in the pio send context error
handler.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f9e76ca3

MAINTAINERS: Remove bouncing @mellanox.com addresses · 27f70620

由 Leon Romanovsky 提交于 5月 03, 2018

Delete non-existent @mellanox.com addresses from MAINTAINERS file.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

27f70620

IB: remove redundant INFINIBAND kconfig dependencies · 9533b292

由 Greg Thelen 提交于 5月 03, 2018

INFINIBAND_ADDR_TRANS depends on INFINIBAND.  So there's no need for
options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND.
Remove the unnecessary INFINIBAND depends.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9533b292

04 5月, 2018 8 次提交

RDMA/cma: Do not query GID during QP state transition to RTR · 9aa16921

由 Parav Pandit 提交于 5月 02, 2018

When commit [1] was added, SGID was queried to derive the SMAC address.
Then, later on during a refactor [2], SMAC was no longer needed. However,
the now useless GID query remained.  Then during additional code changes
later on, the GID query was being done in such a way that it caused iWARP
queries to start breaking.  Remove the useless GID query and resolve the
iWARP breakage at the same time.

This is discussed in [3].

[1] commit dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
[2] commit 5c266b23 ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av")
[3] https://www.spinics.net/lists/linux-rdma/msg63951.htmlSuggested-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9aa16921

IB/mlx4: Fix integer overflow when calculating optimal MTT size · b03bcde9

由 Jack Morgenstein 提交于 5月 02, 2018

When the kernel was compiled using the UBSAN option,
we saw the following stack trace:

[ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27
[ 1184.828114] signed integer overflow:
[ 1184.828247] -2147483648 - 1 cannot be represented in type 'int'

The problem was caused by calling round_up in procedure
mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack
trace) with the second parameter (1 << block_shift) (which is an int).
The second parameter should have been (1ULL << block_shift) (which
is an unsigned long long).

(1 << block_shift) is treated by the compiler as an int (because 1 is
an integer).

Now, local variable block_shift is initialized to 31.
If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368.
This is the most negative int value.

Inside the round_up macro, there is a cast applied to ((1 << 31) - 1).
However, this cast is applied AFTER ((1 << 31) - 1) is calculated.
Since (1 << 31) is treated as an int, we get the negative overflow
identified by UBSAN in the process of calculating ((1 << 31) - 1).

The fix is to change (1 << block_shift) to (1ULL << block_shift) on
line 349.

Fixes: 9901abf5 ("IB/mlx4: Use optimal numbers of MTT entries")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b03bcde9

IB/hfi1: Fix memory leak in exception path in get_irq_affinity() · 59482a14

由 Sebastian Sanchez 提交于 5月 01, 2018

When IRQ affinity is set and the interrupt type is unknown, a cpu
mask allocated within the function is never freed. Fix this memory
leak by allocating memory within the scope where it is used.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

59482a14

IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure · e9777ad4

由 Sebastian Sanchez 提交于 5月 01, 2018

When allocating device data, if there's an allocation failure, the
already allocated memory won't be freed such as per-cpu counters.

Fix memory leaks in exception path by creating a common reentrant
clean up function hfi1_clean_devdata() to be used at driver unload
time and device data allocation failure.

To accomplish this, free_platform_config() and clean_up_i2c() are
changed to be reentrant to remove dependencies when they are called
in different order. This helps avoid NULL pointer dereferences
introduced by this patch if those two functions weren't reentrant.

In addition, set dd->int_counter, dd->rcv_limit,
dd->send_schedule and dd->tx_opstats to NULL after they're freed in
hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e9777ad4

IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used · 45d92457

由 Sebastian Sanchez 提交于 5月 01, 2018

When an invalid num_vls is used as a module parameter, the code
execution follows an exception path where the macro dd_dev_err()
expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This
causes a NULL pointer dereference.

Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev
earlier in the code. If a dd exists, then dd->pcidev and
dd->pcidev->dev always exists.

BUG: unable to handle kernel NULL pointer dereference
at 00000000000000f0
IP: __dev_printk+0x15/0x90
Workqueue: events work_for_cpu_fn
RIP: 0010:__dev_printk+0x15/0x90
Call Trace:
 dev_err+0x6c/0x90
 ? hfi1_init_pportdata+0x38d/0x3f0 [hfi1]
 hfi1_init_dd+0xdd/0x2530 [hfi1]
 ? pci_conf1_read+0xb2/0xf0
 ? pci_read_config_word.part.9+0x64/0x80
 ? pci_conf1_write+0xb0/0xf0
 ? pcie_capability_clear_and_set_word+0x57/0x80
 init_one+0x141/0x490 [hfi1]
 local_pci_probe+0x3f/0xa0
 work_for_cpu_fn+0x10/0x20
 process_one_work+0x152/0x350
 worker_thread+0x1cf/0x3e0
 kthread+0xf5/0x130
 ? max_active_store+0x80/0x80
 ? kthread_bind+0x10/0x10
 ? do_syscall_64+0x6e/0x1a0
 ? SyS_exit_group+0x10/0x10
 ret_from_fork+0x35/0x40

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

45d92457

IB/hfi1: Fix loss of BECN with AHG · 0a0bcb04

由 Mike Marciniszyn 提交于 5月 01, 2018

AHG may be armed to use the stored header, which by design is limited
to edits in the PSN/A 32 bit word (bth2).

When the code is trying to send a BECN, the use of the stored header
will lose the BECN bit.

Fix by avoiding AHG when getting ready to send a BECN. This is
accomplished by always claiming the packet is not a middle packet which
is an AHG precursor.  BECNs are not a normal case and this should not
hurt AHG optimizations.

Cc: <stable@vger.kernel.org> # 4.14.x
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0a0bcb04

IB/hfi1 Use correct type for num_user_context · 5da9e742

由 Michael J. Ruhl 提交于 5月 01, 2018

The module parameter num_user_context is defined as 'int' and
defaults to -1.  The module_param_named() says that it is uint.

Correct module_param_named() type information and update the modinfo
text to reflect the default value.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5da9e742

IB/hfi1: Fix handling of FECN marked multicast packet · f59fb9e0

由 Mike Marciniszyn 提交于 5月 01, 2018

The code for handling a marked UD packet unconditionally returns the
dlid in the header of the FECN marked packet.  This is not correct
for multicast packets where the DLID is in the multicast range.

The subsequent attempt to send the CNP with the multicast lid will
cause the chip to halt the ack send context because the source
lid doesn't match the chip programming.   The send context will
be halted and flush any other pending packets in the pio ring causing
the CNP to not be sent.

A part of investigating the fix, it was determined that the 16B work
broke the FECN routine badly with inconsistent use of 16 bit and 32 bits
types for lids and pkeys.  Since the port's source lid was correctly 32
bits the type mixmatches need to be dealt with at the same time as
fixing the CNP header issue.

Fix these issues by:
- Using the ports lid for as the SLID for responding to FECN marked UD
  packets
- Insure pkey is always 16 bit in this and subordinate routines
- Insure lids are 32 bits in this and subordinate routines

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 88733e3b ("IB/hfi1: Add 16B UD support")
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f59fb9e0

01 5月, 2018 1 次提交

IB/core: Make ib_mad_client_id atomic · db82476f

由 Håkon Bugge 提交于 4月 18, 2018

Currently, the kernel protects access to the agent ID allocator on a per
port basis using a spinlock, so it is impossible for two apps/threads on
the same port to get the same TID, but it is entirely possible for two
threads on different ports to end up with the same TID.

As this can be confusing (regardless of it being legal according to the
IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage),
and as the rdma-core user space API for /dev/umad devices implies unique
TIDs even across ports, make the TID an atomic type so that no two
allocations, regardless of port number, will be the same.
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

db82476f

28 4月, 2018 19 次提交

iw_cxgb4: Atomically flush per QP HW CQEs · 2df19e19

由 Bharat Potnuri 提交于 4月 27, 2018

When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire
corresponding QP lock before moving the CQEs into its corresponding SW
queue and accessing the SQ contents for completing a WR.
Ignore CQEs if corresponding QP is already flushed.

Cc: stable@vger.kernel.org
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2df19e19

IB/uverbs: Fix kernel crash during MR deregistration flow · 54e7e48b

由 Ariel Levkovich 提交于 4月 26, 2018

This patch fixes a crash that happens due to access to an
uninitialized DM pointer within the MR object.

The change makes sure the DM pointer in the MR object is set to
NULL during a non-DM MR creation to prevent a false indication
that this MR is related to a DM in the dereg flow.

Fixes: be934cca ("IB/uverbs: Add device memory registration ioctl support")
Reported-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

54e7e48b

IB/uverbs: Prevent reregistration of DM_MR to regular MR · 5ccbf63f

由 Ariel Levkovich 提交于 4月 26, 2018

This patch adds a check in the ib_uverbs_rereg_mr flow to make
sure there's no attempt to rereg a device memory MR to regular MR.
In such case the command will fail with -EINVAL status.

fixes: be934cca ("IB/uverbs: Add device memory registration ioctl support")
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5ccbf63f

RDMA/mlx4: Add missed RSS hash inner header flag · 4f9ca2d8

由 Leon Romanovsky 提交于 4月 26, 2018

Despite being advertised to user space application, the RSS inner
header flag was filtered by checks at the beginning of QP creation
routine.

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 4d02ebd9 ("IB/mlx4: Fix RSS hash fields restrictions")
Fixes: 07d84f7b ("IB/mlx4: Add support to RSS hash for inner headers")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4f9ca2d8

RDMA/hns: Fix a couple misspellings · ab178849

由 oulijun 提交于 4月 26, 2018

This patch fixes two spelling errors.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ab178849

RDMA/hns: Submit bad wr · 137ae320

由 oulijun 提交于 4月 26, 2018

When generated bad work reqeust, it needs to
report to user. This patch mainly fixes it.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

137ae320

RDMA/hns: Update assignment method for owner field of send wqe · 634f6390

由 oulijun 提交于 4月 26, 2018

When posting a work reqeust, it need to update the owner bit of send
wqe. This patch mainly fix the bug when posting multiply work
request.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

634f6390

RDMA/hns: Adjust the order of cleanup hem table · ae25db00

由 oulijun 提交于 4月 26, 2018

This patch update the order of cleaning hem table for trrl_table and irrl_table
as well as mtt_cqe_table and mtt_table.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ae25db00

RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set · b6dd9b34

由 oulijun 提交于 4月 26, 2018

Only when the IB_QP_PATH_DEST_QPN flag of attr_mask is set
is it valid to assign the dqpn field of qp context
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b6dd9b34

RDMA/hns: Remove some unnecessary attr_mask judgement · 734f3863

由 oulijun 提交于 4月 26, 2018

This patch deletes some unnecessary attr_mask if condition
in hip08 according to the IB protocol.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

734f3863

RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set · 6852af86

由 oulijun 提交于 4月 26, 2018

Only when the IB_QP_PATH_MTU flag of attr_mask is set
it is valid to assign the mtu field of qp context when
qp type is not GSI and UD.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6852af86

RDMA/hns: Fix the qp context state diagram · 6e1a7094

由 oulijun 提交于 4月 26, 2018

According to RoCE protocol, it is possible to
transition from error to error state for modifying
qp in hip08. This patch fix it.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6e1a7094

RDMA/hns: Intercept illegal RDMA operation when use inline data · 328d405b

由 oulijun 提交于 4月 26, 2018

RDMA read operation is not supported inline data. If user cofigures
issue a RDMA read and use inline data, it will happen a hardware
error.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

328d405b

RDMA/hns: Bugfix for init hem table · 215a8c09

由 oulijun 提交于 4月 26, 2018

During init hem table, type should be used instead of
table->type which is finally initializaed with type.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

215a8c09

IB/rxe: avoid double kfree_skb · 9fd4350b

由 Zhu Yanjun 提交于 4月 26, 2018

When skb is sent, it will pass the following functions in soft roce.

rxe_send [rdma_rxe]
    ip_local_out
        __ip_local_out
        ip_output
            ip_finish_output
                ip_finish_output2
                    dev_queue_xmit
                        __dev_queue_xmit
                            dev_hard_start_xmit

In the above functions, if error occurs in the above functions or
iptables rules drop skb after ip_local_out, kfree_skb will be called.
So it is not necessary to call kfree_skb in soft roce module again.
Or else crash will occur.

The steps to reproduce:

     server                       client
    ---------                    ---------
    |1.1.1.1|<----rxe-channel--->|1.1.1.2|
    ---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512

The kernel configs CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_OBJECTS are enabled on both server and client.

When rping runs, run the following command in server:

iptables -I OUTPUT -p udp  --dport 4791 -j DROP

Without this patch, crash will occur.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9fd4350b

IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV · 2da36d44

由 Jianchao Wang 提交于 4月 26, 2018

w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
will not be updated in update_wqe_psn, and the corresponding
wqe will not be acked in rxe_completer due to its last_psn is
zero. Finally, the other wqe will also not be able to be acked,
because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
is still there. This causes large amount of io timeout when
nvmeof is over rxe.

Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2da36d44

RDMA/iwpm: fix memory leak on map_info · f96416ce

由 Colin Ian King 提交于 4月 25, 2018

In the cases where iwpm_hash_bucket is NULL and where function
get_mapinfo_hash_bucket returns NULL then the map_info is never added
to hash_bucket_head and hence there is a leak of map_info. Fix this
by nullifying hash_bucket_head and if that is null we know that
that map_info was not added to hash_bucket_head and hence map_info
should be free'd.

Detected by CoverityScan, CID#1222481 ("Resource Leak")

Fixes: 30dc5e63 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f96416ce

IB/ipoib: fix ipoib_start_xmit()'s return type · 47a3968a

由 Luc Van Oostenryck 提交于 4月 24, 2018

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

47a3968a

IB/nes: fix nes_netdev_start_xmit()'s return type · c192a12c

由 Luc Van Oostenryck 提交于 4月 24, 2018

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c192a12c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功