提交 · 51d0a2b4cfa9979fd8a59faf483b4e84587ab4ea · openeuler / Kernel

13 8月, 2018 4 次提交

J
IB/uverbs: Remove struct uverbs_root_spec and all supporting code · 51d0a2b4
由 Jason Gunthorpe 提交于 8月 09, 2018
```
Everything now uses the uverbs_uapi data structure.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
```
51d0a2b4

IB/uverbs: Use uverbs_api to unmarshal ioctl commands · 3a863577

由 Jason Gunthorpe 提交于 8月 09, 2018

Convert the ioctl method syscall path to use the uverbs_api data
structures. The new uapi structure includes all the same information, just
in a different and more optimal way.

 - Use attr_bkey instead of 2 level radix trees for everything related to
   attributes. This includes the attribute storage, presence, and
   detection of missing mandatory attributes.
 - Avoid iterating over all attribute storage at finish, instead use
   find_first_bit with the attr_bkey to locate only those attrs that need
   cleanup.
 - Organize things to always run, and always rely on, cleanup. This
   avoids a bunch of tricky error unwind cases.
 - Locate the method using the radix tree, and locate the attributes
   using a very efficient incremental radix tree lookup
 - Use the precomputed destroy_bkey to handle uobject destruction
 - Use the precomputed allocation sizes and precomputed 'need_stack'
   to avoid maths in the fast path. This is optimal if userspace
   does not pass (many) unsupported attributes.

Overall this results in much better codegen for the attribute accessors,
everything is now stored in bitmaps or linear arrays indexed by attr_bkey.
The compiler can compute attr_bkey values at compile time for all method
attributes, meaning things like uverbs_attr_is_valid() now compile into
single instruction bit tests.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3a863577

IB/uverbs: Use uverbs_alloc for allocations · b61815e2

由 Jason Gunthorpe 提交于 8月 09, 2018

Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

b61815e2

IB/uverbs: Add a simple allocator to uverbs_attr_bundle · 461bb2ee

由 Jason Gunthorpe 提交于 8月 09, 2018

This is similar in spirit to devm, it keeps track of any allocations
linked to this method call and ensures they are all freed when the method
exits. Further, if there is space in the internal/onstack buffer then the
allocator will hand out that memory and avoid an expensive call to
kalloc/kfree in the syscall path.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

461bb2ee

11 8月, 2018 5 次提交

IB/uverbs: Remove the ib_uverbs_attr pointer from each attr · 6a1f444f

由 Jason Gunthorpe 提交于 8月 09, 2018

Memory in the bundle is valuable, do not waste it holding an 8 byte
pointer for the rare case of writing to a PTR_OUT. We can compute the
pointer by storing a small 1 byte array offset and the base address of the
uattr memory in the bundle private memory.

This also means we can access the kernel's copy of the ib_uverbs_attr, so
drop the copy of flags as well.

Since the uattr base should be private bundle information this also
de-inlines the already too big uverbs_copy_to inline and moves
create_udata into uverbs_ioctl.c so they can see the private struct
definition.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

6a1f444f

IB/uverbs: Provide implementation private memory for the uverbs_attr_bundle · 4b3dd2bb

由 Jason Gunthorpe 提交于 8月 09, 2018

This already existed as the anonymous 'ctx' structure, but this was not
really a useful form. Hoist this struct into bundle_priv and rework the
internal things to use it instead.

Move a bunch of the processing internal state into the priv and reduce the
excessive use of function arguments.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

4b3dd2bb

IB/uverbs: Use uverbs_api to manage the object type inside the uobject · 6b0d08f4

由 Jason Gunthorpe 提交于 8月 09, 2018

Currently the struct uverbs_obj_type stored in the ib_uobject is part of
the .rodata segment of the module that defines the object. This is a
problem if drivers define new uapi objects as we will be left with a
dangling pointer after device disassociation.

Switch the uverbs_obj_type for struct uverbs_api_object, which is
allocated memory that is part of the uverbs_api and is guaranteed to
always exist. Further this moves the 'type_class' into this memory which
means access to the IDR/FD function pointers is also guaranteed. Drivers
cannot define new types.

This makes it safe to continue to use all uobjects, including driver
defined ones, after disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6b0d08f4

IB/uverbs: Build the specs into a radix tree at runtime · 9ed3e5f4

由 Jason Gunthorpe 提交于 8月 09, 2018

This radix tree datastructure is intended to replace the 'hash' structure
used today for parsing ioctl methods during system calls. This first
commit introduces the structure and builds it from the existing .rodata
descriptions.

The so-called hash arrangement is actually a 5 level open coded radix tree.
This new version uses a 3 level radix tree built using the radix tree
library.

Overall this is much less code and much easier to build as the radix tree
API allows for dynamic modification during the building. There is a small
memory penalty to pay for this, but since the radix tree is allocated on
a per device basis, a few kb of RAM seems immaterial considering the
gained simplicity.

The radix tree is similar to the existing tree, but also has a 'attr_bkey'
concept, which is a small value'd index for each method attribute. This is
used to simplify and improve performance of everything in the next
patches.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>

9ed3e5f4

IB/uverbs: Have the core code create the uverbs_root_spec · 7d96c9b1

由 Jason Gunthorpe 提交于 8月 09, 2018

There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

7d96c9b1

10 8月, 2018 1 次提交

IB/uverbs: Fix reading of 32 bit flags · 922983c2

由 Jason Gunthorpe 提交于 8月 09, 2018

This is missing a zeroing of the high bits of flags, and is also not
correct for big endian machines. Properly zero extend the 32 bit flags
into the 64 bit stack variable.
Reported-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Fixes: bccd0622 ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language")
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>

922983c2

08 8月, 2018 6 次提交

RDMA/rxe: Set wqe->status correctly if an unexpected response is received · 61b717d0

由 Bart Van Assche 提交于 6月 26, 2018

Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

61b717d0

iw_cxgb4: pass window scale in flowc work request · 2e51e45c

由 Potnuri Bharat Teja 提交于 8月 03, 2018

This will allow FW to not send more data to TP (which would then need to
be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR.

Also refactor send_flowc() a bit to clean it up.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2e51e45c

RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq · 0dfe4522

由 Leon Romanovsky 提交于 8月 01, 2018

[   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
[   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
[   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
[   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   61.188315] Call Trace:
[   61.188661]  dump_stack+0xc7/0x13b
[   61.190427]  ubsan_epilogue+0x9/0x49
[   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
[   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
[   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
[   61.213892]  ib_uverbs_write+0x77e/0xae0
[   61.248018]  vfs_write+0x121/0x3b0
[   61.249831]  ksys_write+0xa1/0x120
[   61.254024]  do_syscall_64+0x7c/0x2a0
[   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   61.259211] RIP: 0033:0x7f54bab70e99
[   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
[   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
[   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
[   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
[   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
[   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 79b20a6c ("IB/mlx5: Add receive Work Queue verbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: NNoa Osherovich <noaos@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0dfe4522

test_overflow: Add shift overflow tests · d36b6ad2

由 Kees Cook 提交于 8月 01, 2018

This adds overflow tests for the new check_shift_overflow() helper to
validate overflow, signedness glitches, storage glitches, etc.
Co-developed-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d36b6ad2

overflow.h: Add arithmetic shift helper · 0c668477

由 Jason Gunthorpe 提交于 8月 01, 2018

Add shift_overflow() helper to assist driver authors in ensuring that
shift operations don't cause overflows or other odd conditions.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
[kees: tweaked comments and commit log, dropped unneeded assignment]
Signed-off-by: NKees Cook <keescook@chromium.org>

0c668477

IB/ucm: Initialize sgid request GID attribute pointer · 58796e67

由 Parav Pandit 提交于 8月 06, 2018

sgid_attr is uninitialized on the stack, initialize it to NULL.

Fixes: 39839107 ("IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NYossi Itigin <yosefe@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

58796e67

03 8月, 2018 13 次提交

IB/ipoib: Consolidate checking of the proposed child interface · 76010976

由 Jason Gunthorpe 提交于 7月 29, 2018

Move all the checking for pkey and other validity to the __ipoib_vlan_add
function. This removes the last difference from the control flow
of the __ipoib_vlan_add to make the overall design simpler to
understand.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

76010976

IB/ipoib: Maintain the child_intfs list from ndo_init/uninit · 13476d35

由 Jason Gunthorpe 提交于 7月 29, 2018

This fixes a bug in the netlink path where the vlan_rwsem was not
held around __ipoib_vlan_add causing the child_intfs to be manipulated
unsafely.

In the process this greatly simplifies the vlan_rwsem write side locking
to only cover a single non-sleeping statement.

This also further increases the safety of the removal ordering by holding
the netdev of the parent while the child is active to ensure most bugs
become either an oops on a NULL priv or a deadlock on the netdev refcount.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

13476d35

IB/ipoib: Do not remove child devices from within the ndo_uninit · 25405d98

由 Jason Gunthorpe 提交于 7月 29, 2018

Switching to priv_destructor and needs_free_netdev created a subtle
ordering problem in ipoib_remove_one.

Now that unregister_netdev frees the netdev and priv we must ensure that
the children are unregistered before trying to unregister the parent,
or child unregister will use after free.

The solution is to unregister the children, then parent, in the same batch
all while holding the rtnl_lock. This closes all the races where a new
child could have been added and ensures proper ordering.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

25405d98

IB/ipoib: Get rid of the sysfs_mutex · ee190ab7

由 Jason Gunthorpe 提交于 7月 29, 2018

This mutex was introduced to deal with the deadlock formed by calling
unregister_netdev from within the sysfs callback of a netdev.

Now that we have priv_destructor and needs_free_netdev we can switch
to the more targeted solution of running the unregister from a
work queue. This avoids the deadlock and gets rid of the mutex.

The next patch in the series needs this mutex eliminated to create
atomicity of unregisteration.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

ee190ab7

RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5

由 Jason Gunthorpe 提交于 7月 29, 2018

Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9f49a5b5

IB/ipoib: Move init code to ndo_init · eaeb3984

由 Jason Gunthorpe 提交于 7月 29, 2018

Now that we have a proper ndo_uninit, move code that naturally pairs
with the ndo_uninit into ndo_init. This allows the netdev core to natually
handle ordering.

This fixes the situation where register_netdev can fail before calling
ndo_init, in which case it wouldn't call ndo_uninit either.

Also move a bunch of duplicated init code that is shared between child
and parent for clarity. Now the child and parent register functions look
very similar.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

eaeb3984

IB/ipoib: Move all uninit code into ndo_uninit · 7cbee87c

由 Jason Gunthorpe 提交于 7月 29, 2018

Currently uninit is sometimes done twice in error flows, and is sprinkled
a bit all over the place.

Improve the clarity of the design by moving all uninit only into
ndo_uinit.

Some duplication is removed:
 - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but
   this duplicates the process in ipoib_neigh_hash_init
 - Flushing priv->wq was sometimes done before unregister,
   but that duplicates what has been done in ndo_uninit

Uniniting the IB event queue must remain before unregister_netdev as it
requires the RTNL lock to be dropped, this is moved to a helper to make
that flow really clear and remove some duplication in error flows.

If register_netdev fails (and ndo_init is NULL) then it almost always
calls ndo_uninit, which lets us remove all the extra code from the error
unwinds. The next patch in the series will close the 'almost always' hole
by pairing a proper ndo_init with ndo_uninit.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

7cbee87c

IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task · cda8daf1

由 Erez Shitrit 提交于 7月 29, 2018

The neigh_reap_task is self restarting, but so long as we call
cancel_delayed_work_sync() it will be guaranteed to not be running and
never start again. Thus we don't need to have the racy
IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes
calling flush_workqueue after the cancel.

This fixes a situation where the GC work could have been left running
in some rare situations.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cda8daf1

IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN · 577e07ff

由 Jason Gunthorpe 提交于 7月 29, 2018

This essentially duplicates the netdev's reg_state, so just use that
directly. The reg_state is updated under the rntl_lock, and all places
using GOING_DOWN already acquire the rtnl_lock so checking is safe.

Since the only place we use GOING_DOWN is for the parent device this
does not fix any bugs, but it is a step to tidy up the unregister flow
so that after later patches the flow is uniform and sane.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

577e07ff

iw_cxgb4: Support FW write completion WR · 94245f4a

由 Potnuri Bharat Teja 提交于 8月 02, 2018

To optimize NVME-oF READ IOPs, use a specialized WQE that combines
the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target
driver.

This reduces uP overhead per NVME-oF IO, and results in over 10%
improvement in NVME-oF 4K READ IOPs.
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

94245f4a

iw_cxgb4: RDMA write with immediate support · b9855f4c

由 Potnuri Bharat Teja 提交于 8月 02, 2018

Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.
Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b9855f4c

rdma/cxgb4: fix some info leaks · 8001b717

由 Dan Carpenter 提交于 8月 02, 2018

In c4iw_create_qp() there are several struct members which potentially
aren't inintialized like uresp.rq_key.  I've fixed this code before in
in commit ae1fe07f ("RDMA/cxgb4: Fix stack info leak in
c4iw_create_qp()") so this time I'm just going to take a big hammer
approach and memset the whole struct to zero.  Hopefully, it will stay
fixed this time.

In c4iw_create_srq() we don't clear uresp.reserved.

Fixes: 6a0b6174 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8001b717

RDMA/hns: Support flush cqe for hip08 in kernel space · 0425e3e6

由 Yixian Liu 提交于 8月 02, 2018

According to IB protocol, there are some cases that work requests must
return the flush error completion status through the completion queue. Due
to hardware limitation, the driver needs to assist the flush process.

This patch adds the support of flush cqe for hip08 in the cases that
needed, such as poll cqe, post send, post recv and aeqe handle.

The patch also considered the compatibility between kernel and user space.
Signed-off-by: NYixian Liu <liuyixian@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0425e3e6

02 8月, 2018 11 次提交

IB/IPoIB: Set ah valid flag in multicast send flow · 75da9606

由 Denis Drozdov 提交于 7月 29, 2018

The change of ipoib_ah data structure with adding "valid" flag and
checks of ah->valid in ipoib_start_xmit affected multicast packet flow.

Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag
remains unset, so that ipoib_start_xmit end up with neigh_refresh_path
instead of sending the packet using neigh.

"ah->valid" has to be set in multicast send flow. As a result IPoIB
starts sending packets via neigh immediately and eliminates 60sec delay
of neigh keep alive interval.

The typical example of this issue are two sequential arpings:

arping 11.134.208.9 -> got response (mcast_send)
arping 11.134.208.9 -> no response  (ah->valid = 0)

Fixes: fa9391db ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events")
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Reviewed-by: NErez Shitrit <erezsh@mellanox.com>
Reviewed-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

75da9606

IB/uverbs: Allow all DESTROY commands to succeed after disassociate · 0f50d88a

由 Jason Gunthorpe 提交于 7月 25, 2018

The disassociate function was broken by design because it failed all
commands. This prevents userspace from calling destroy on a uobject after
it has detected a device fatal error and thus reclaiming the resources in
userspace is prevented.

This fix is now straightforward, when anything destroys a uobject that is
not the user the object remains on the IDR with a NULL context and object
pointer. All lookup locking modes other than DESTROY will fail. When the
user ultimately calls the destroy function it is simply dropped from the
IDR while any related information is returned.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0f50d88a

IB/uverbs: Do not block disassociate during write() · a9b66d64

由 Jason Gunthorpe 提交于 7月 25, 2018

Now that all the callbacks are safe to run concurrently with
disassociation this test can be eliminated. The ufile core infrastructure
becomes entirely self contained and is not sensitive to disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a9b66d64

IB/uverbs: Do not pass struct ib_device to the ioctl methods · e83f0ecd

由 Jason Gunthorpe 提交于 7月 25, 2018

This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e83f0ecd

IB/uverbs: Do not pass struct ib_device to the write based methods · bbd51e88

由 Jason Gunthorpe 提交于 7月 25, 2018

This is a step to get rid of the global check for disassociation. In this
model, the ib_dev is not proven to be valid by the core code and cannot be
provided to the method. Instead, every method decides if it is able to
run after disassociation and obtains the ib_dev using one of three
different approaches:

- Call srcu_dereference on the udevice's ib_dev. As before, this means
  the method cannot be called after disassociation begins.
  (eg alloc ucontext)
- Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext()
- Retrieve the ib_dev from the uobject->object after checking
  under SRCU if disassociation has started (eg uobj_get)

Largely, the code is all ready for this, the main work is to provide a
ib_dev after calling uobj_alloc(). The few other places simply use
ib_uverbs_get_ucontext() to get the ib_dev.

This flexibility will let the next patches allow destroy to operate
after disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bbd51e88

IB/uverbs: Lower the test for ongoing disassociation · cc2e14e6

由 Jason Gunthorpe 提交于 7月 25, 2018

Commands that are reading/writing to objects can test for an ongoing
disassociation during their initial call to rdma_lookup_get_uobject.  This
directly prevents all of these commands from conflicting with an ongoing
disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cc2e14e6

IB/uverbs: Allow uobject allocation to work concurrently with disassociate · 1e857e65

由 Jason Gunthorpe 提交于 7月 25, 2018

After all the recent structural changes this is now straightforward, hold
the hw_destroy_rwsem across the entire uobject creation. We already take
this semaphore on the success path, so holding it a bit longer is not
going to change the performance.

After this change none of the create callbacks require the
disassociate_srcu lock to be correct.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1e857e65

IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate · 7452a3c7

由 Jason Gunthorpe 提交于 7月 25, 2018

After all the recent structural changes this is now straightfoward, hoist
the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around
the uobject write lock as well as the destroy.

This is necessary as obtaining a write lock concurrently with
uverbs_destroy_ufile_hw() will cause malfunction.

After this change none of the destroy callbacks require the
disassociate_srcu lock to be correct.

This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the
IOCTL interface needs to hold an unlocked kref until all command
verification is completed.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7452a3c7

IB/uverbs: Convert 'bool exclusive' into an enum · 9867f5c6

由 Jason Gunthorpe 提交于 7月 25, 2018

This is more readable, and future patches will need a 3rd lookup type.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9867f5c6

IB/uverbs: Consolidate uobject destruction · 87ad80ab

由 Jason Gunthorpe 提交于 7月 25, 2018

There are several flows that can destroy a uobject and each one is
minimized and sprinkled throughout the code base, making it difficult to
understand and very hard to modify the destroy path.

Consolidate all of these into uverbs_destroy_uobject() and call it in all
cases where a uobject has to be destroyed.

This makes one change to the lifecycle, during any abort (eg when
alloc_commit is not called) we always call out to alloc_abort, even if
remove_commit needs to be called to delete a HW object.

This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
clarify its actual usage and revises some of the comments to reflect what
the life cycle is for the type implementation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

87ad80ab

IB/uverbs: Make the write path destroy methods use the same flow as ioctl · 32ed5c00

由 Jason Gunthorpe 提交于 7月 25, 2018

The ridiculous dance with uobj_remove_commit() is not needed, the write
path can follow the same flow as ioctl - lock and destroy the HW object
then use the data left over in the uobject to form the response to
userspace.

Two helpers are introduced to make this flow straightforward for the
caller.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

32ed5c00

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功