提交 · 08aeb97cb82483192bd8ad8e60d1b73ce1b75923 · openeuler / Kernel

06 9月, 2018 3 次提交

RDMA/mlx5: Add new flow action verb - packet reformat · 08aeb97c

由 Mark Bloch 提交于 8月 28, 2018

For now, only add L2_TUNNEL_TO_L2 option. This will allow to perform
generic decap operation if the encapsulating protocol is L2 based, and the
inner packet is also L2 based. For example this can be used to decap VXLAN
packets.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

08aeb97c

RDMA/uverbs: Add generic function to fill in flow action object · 841eefc5

由 Mark Bloch 提交于 8月 28, 2018

Refactor the initialization of a flow action object to a common function.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

841eefc5

RDMA/mlx5: Add a new flow action verb - modify header · b4749bf2

由 Mark Bloch 提交于 8月 28, 2018

Expose the ability to create a flow action which changes packet
headers. The data passed from userspace should be modify header actions as
defined by HW specification.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b4749bf2

05 9月, 2018 1 次提交

{net, RDMA}/mlx5: Rename encap to reformat packet · 60786f09

由 Mark Bloch 提交于 8月 28, 2018

Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

60786f09

23 8月, 2018 1 次提交

mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7

由 Michal Hocko 提交于 8月 21, 2018

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: NDavid Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93065ac7

21 8月, 2018 1 次提交

IB/hfi1: Invalid NUMA node information can cause a divide by zero · c513de49

由 Michael J. Ruhl 提交于 8月 15, 2018

If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.

This can lead to the following crash:

divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G          IOE
------------   3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8  EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS:  0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 hfi1_init_dd+0x14b3/0x27a0 [hfi1]
 ? pcie_capability_write_word+0x46/0x70
 ? hfi1_pcie_init+0xc0/0x200 [hfi1]
 do_init_one+0x153/0x4c0 [hfi1]
 ? sched_clock_cpu+0x85/0xc0
 init_one+0x1b5/0x260 [hfi1]
 local_pci_probe+0x4a/0xb0
 work_for_cpu_fn+0x1a/0x30
 process_one_work+0x17f/0x440
 worker_thread+0x278/0x3c0
 ? manage_workers.isra.24+0x2a0/0x2a0
 kthread+0xd1/0xe0
 ? insert_kthread_work+0x40/0x40
 ret_from_fork+0x77/0xb0
 ? insert_kthread_work+0x40/0x40

If the BIOS is not supplying NUMA information:
  - set the default table count to 1 for all possible nodes
  - select node 0 (instead of current NUMA) node to get consistent
    performance
  - generate an error indicating that the BIOS should be upgraded
Reviewed-by: NGary Leshner <gary.s.leshner@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c513de49

16 8月, 2018 1 次提交

RDMA/hns: Fix usage of bitmap allocation functions return values · a1ceeca6

由 Gal Pressman 提交于 8月 09, 2018

hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.

Fixes: a598c6f4 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: NGal Pressman <pressmangal@gmail.com>
Acked-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a1ceeca6

15 8月, 2018 4 次提交

qedr: Add user space support for SRQ · 40b173dd

由 Yuval Bason 提交于 8月 09, 2018

This patch adds support for SRQ's created in user space and update
qedr_affiliated_event to deal with general SRQ events.
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NYuval Bason <yuval.bason@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

40b173dd

qedr: Add support for kernel mode SRQ's · 3491c9e7

由 Yuval Bason 提交于 8月 09, 2018

Implement the SRQ specific verbs and update the poll_cq verb to deal with
SRQ completions.
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NYuval Bason <yuval.bason@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3491c9e7

qedr: Add wrapping generic structure for qpidr and adjust idr routines. · 1212767e

由 Yuval Bason 提交于 8月 09, 2018

Today, we are using idr mechanism for QP's only.
This patch prepares the qedr_idr stuctures and the idr routines for
both QP's and SRQ's.
Signed-off-by: NYuval Bason <yuval.bason@cavium.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1212767e

IB/mlx5: Fix leaking stack memory to userspace · 0625b4ba

由 Jason Gunthorpe 提交于 8月 14, 2018

mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes
were written.

Fixes: 41d902cb ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp")
Cc: <stable@vger.kernel.org>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0625b4ba

13 8月, 2018 1 次提交

IB/uverbs: Use uverbs_alloc for allocations · b61815e2

由 Jason Gunthorpe 提交于 8月 09, 2018

Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

b61815e2

11 8月, 2018 1 次提交

IB/uverbs: Have the core code create the uverbs_root_spec · 7d96c9b1

由 Jason Gunthorpe 提交于 8月 09, 2018

There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

7d96c9b1

08 8月, 2018 2 次提交

iw_cxgb4: pass window scale in flowc work request · 2e51e45c

由 Potnuri Bharat Teja 提交于 8月 03, 2018

This will allow FW to not send more data to TP (which would then need to
be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR.

Also refactor send_flowc() a bit to clean it up.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2e51e45c

RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq · 0dfe4522

由 Leon Romanovsky 提交于 8月 01, 2018

[   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
[   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
[   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
[   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   61.188315] Call Trace:
[   61.188661]  dump_stack+0xc7/0x13b
[   61.190427]  ubsan_epilogue+0x9/0x49
[   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
[   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
[   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
[   61.213892]  ib_uverbs_write+0x77e/0xae0
[   61.248018]  vfs_write+0x121/0x3b0
[   61.249831]  ksys_write+0xa1/0x120
[   61.254024]  do_syscall_64+0x7c/0x2a0
[   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   61.259211] RIP: 0033:0x7f54bab70e99
[   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
[   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
[   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
[   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
[   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
[   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 79b20a6c ("IB/mlx5: Add receive Work Queue verbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: NNoa Osherovich <noaos@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0dfe4522

03 8月, 2018 5 次提交

RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5

由 Jason Gunthorpe 提交于 7月 29, 2018

Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9f49a5b5

iw_cxgb4: Support FW write completion WR · 94245f4a

由 Potnuri Bharat Teja 提交于 8月 02, 2018

To optimize NVME-oF READ IOPs, use a specialized WQE that combines
the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target
driver.

This reduces uP overhead per NVME-oF IO, and results in over 10%
improvement in NVME-oF 4K READ IOPs.
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

94245f4a

iw_cxgb4: RDMA write with immediate support · b9855f4c

由 Potnuri Bharat Teja 提交于 8月 02, 2018

Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9855f4c

rdma/cxgb4: fix some info leaks · 8001b717

由 Dan Carpenter 提交于 8月 02, 2018

In c4iw_create_qp() there are several struct members which potentially
aren't inintialized like uresp.rq_key.  I've fixed this code before in
in commit ae1fe07f ("RDMA/cxgb4: Fix stack info leak in
c4iw_create_qp()") so this time I'm just going to take a big hammer
approach and memset the whole struct to zero.  Hopefully, it will stay
fixed this time.

In c4iw_create_srq() we don't clear uresp.reserved.

Fixes: 6a0b6174 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8001b717

RDMA/hns: Support flush cqe for hip08 in kernel space · 0425e3e6

由 Yixian Liu 提交于 8月 02, 2018

According to IB protocol, there are some cases that work requests must
return the flush error completion status through the completion queue. Due
to hardware limitation, the driver needs to assist the flush process.

This patch adds the support of flush cqe for hip08 in the cases that
needed, such as poll cqe, post send, post recv and aeqe handle.

The patch also considered the compatibility between kernel and user space.
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0425e3e6

02 8月, 2018 1 次提交

IB/uverbs: Do not pass struct ib_device to the ioctl methods · e83f0ecd

由 Jason Gunthorpe 提交于 7月 25, 2018

This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e83f0ecd

01 8月, 2018 4 次提交

RDMA: Fix return code check in rdma_set_cq_moderation · 26e551c5

由 Kamal Heib 提交于 7月 31, 2018

The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

26e551c5

rdma/cxgb4: Simplify a structure initialization · dd708e7b

由 Bart Van Assche 提交于 7月 31, 2018

This patch avoids that sparse reports the following warning:

drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Acked-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

dd708e7b

rdma/cxgb4: Fix SRQ endianness annotations · eb2463ba

由 Bart Van Assche 提交于 7月 31, 2018

This patch avoids that sparse complains about casts to restricted __be32.

Fixes: a3cdaa69 ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

eb2463ba

rdma/cxgb4: Remove a set-but-not-used variable · 7810e09b

由 Bart Van Assche 提交于 7月 31, 2018

This patch avoids that the following warning is reported when building with
W=1:

drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
  u8 status;
     ^~~~~~

Fixes: 6a0b6174 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7810e09b

31 7月, 2018 12 次提交

RDMA/hns: Program the tclass and flow label into the hardware · cdfa4ad5

由 Lijun Ou 提交于 7月 30, 2018

This was missed in a few places, and was just using 0.

Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cdfa4ad5

RDMA/hns: Use macro instead of magic number · 426c4146

由 Lijun Ou 提交于 7月 30, 2018

This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

426c4146

RDMA/hns: Modify qp will return errno when qp type is illegal · ac7cbf96

由 Lijun Ou 提交于 7月 30, 2018

Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ac7cbf96

RDMA/hns: Assign the value for vlan field of qp context · c8e46f8d

由 Lijun Ou 提交于 7月 30, 2018

This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c8e46f8d

RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set · 610b8967

由 Lijun Ou 提交于 7月 30, 2018

Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

610b8967

RDMA/providers: Remove pointless functions · 1ffba626

由 Kamal Heib 提交于 7月 27, 2018

The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1ffba626

RDMA/providers: Fix return value from create_srq callbacks · 8380b74e

由 Kamal Heib 提交于 7月 30, 2018

The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8380b74e

IB/mlx4: Use 4K pages for kernel QP's WQE buffer · f95ccffc

由 Jack Morgenstein 提交于 7月 26, 2018

In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.

Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.

This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.

This patch was tested with Lustre and no performance degradation
was observed.

Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.

Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]

Fixes: 73898db0 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f95ccffc

IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language · bccd0622

由 Jason Gunthorpe 提交于 7月 26, 2018

This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.

Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.

If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.

Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

bccd0622

RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const · d34ac5cd

由 Bart Van Assche 提交于 7月 18, 2018

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d34ac5cd

IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument · 7bb1fafc

由 Bart Van Assche 提交于 7月 18, 2018

Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7bb1fafc

RDMA: Constify the argument of the work request conversion functions · f696bf6d

由 Bart Van Assche 提交于 7月 18, 2018

When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f696bf6d

27 7月, 2018 3 次提交

RDMA/hns: Enable modify_cq for uverbs. · df065107

由 Lijun Ou 提交于 7月 25, 2018

The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

df065107

RDMA/hns: Update the data type of immediate data · 0c4a0e29

由 Lijun Ou 提交于 7月 25, 2018

Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0c4a0e29

RDMA/hns: Use delay instead of usleep · 73b4e1f4

由 Lijun Ou 提交于 7月 25, 2018

In order to avoid using usleep function in lock function, we use delay
function instead of it.  Besides, it also use brackets for standardized
the computed order.
Signed-off-by: NLijun Ou <oulijun@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

73b4e1f4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功