提交 · ece8ea7bfac053cf27c24e1a767b796a4db2fbd7 · openeuler / Kernel

21 9月, 2018 2 次提交

RDMA/umem: Do not use current->tgid to track the mm_struct · d4b4dd1b

由 Jason Gunthorpe 提交于 9月 16, 2018

This is just wrong, the process that calls into the reg_mr is the process
associated with the umem, and that does not have to be the same process
that created the context.

When this code was first written mmgrab() didn't exist, however these days
we can just directly hold the mm_struct pointer in the umem and have no
ambiguity when it comes to releasing the umem as to which mm it was
associated with.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d4b4dd1b

RDMA/ucontext: Add a core API for mmaping driver IO memory · 5f9794dc

由 Jason Gunthorpe 提交于 9月 16, 2018

To support disassociation and PCI hot unplug, we have to track all the
VMAs that refer to the device IO memory. When disassociation occurs the
VMAs have to be revised to point to the zero page, not the IO memory, to
allow the physical HW to be unplugged.

The three drivers supporting this implemented three different versions
of this algorithm, all leaving something to be desired. This new common
implementation has a few differences from the driver versions:

- Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
  the private data allocation to the lifetime of the vma. This avoids any
  tricks with setting vm_ops which Linus didn't like. (see link)
- Support multiple mms, and support properly tracking mmaps triggered by
  processes other than the one first opening the uverbs fd. This makes
  fork behavior of disassociation enabled drivers the same as fork support
  in normal drivers.
- Don't use crazy get_task stuff.
- Simplify the approach for to racing between vm_ops close and
  disassociation, fixing the related bugs most of the driver
  implementations had. Since we are in core code the tracking list can be
  placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
  than any VMAs created by mmap on the uverbs FD.

Link: https://www.spinics.net/lists/stable/msg248747.html
Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5f9794dc

18 9月, 2018 1 次提交

IB/rxe: Revise the ib_wr_opcode enum · 9a59739b

由 Jason Gunthorpe 提交于 8月 14, 2018

This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this
enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of
libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC
values over the years, but has remained stable on IB_WR_LOCAL_INV and
below.

Based on this we can conclude that there is no real user space user of the
values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
and implements the latter operations. rxe has clearly never worked with
these attributes from userspace. Other drivers that support these opcodes
implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
choose to renumber the IB_WR enum in the kernel to match the uABI that
userspace has bee using since before Soft RoCE was merged. This is an
overall simpler configuration for the whole software stack, and obviously
can't break anything existing.
Reported-by: NSeth Howell <seth.howell@intel.com>
Tested-by: NSeth Howell <seth.howell@intel.com>
Fixes: 8700e3e7 ("Soft RoCE driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9a59739b

14 9月, 2018 1 次提交

RDMA: Remove duplicated include from ib_addr.h · cb816cd2

由 YueHaibing 提交于 9月 13, 2018

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cb816cd2

13 9月, 2018 2 次提交

RDMA/core: Consider net ns of gid attribute for RoCE · 0e9d2c19

由 Parav Pandit 提交于 9月 05, 2018

When resolving destination address or route, when net namespace is
unavailable, refer to the net namespace of the netdevice of the SGID
attribute. This is typically the case for requests arriving from the
network for RoCE ports.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0e9d2c19

RDMA/core: Rename rdma_copy_addr to rdma_copy_src_l2_addr · 77addc52

由 Parav Pandit 提交于 9月 05, 2018

Now that rdma_copy_addr() only copies the source addresses and all callers
are interested in copying only source addresses, simplify it to drop the
destination address argument.

Given that it only copies source layer2 addresses, rename it to
rdma_copy_src_l2_addr for better code readability.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

77addc52

11 9月, 2018 6 次提交

IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting · 0b79b277

由 Michael J. Ruhl 提交于 9月 10, 2018

The post_send() path determines if it should post directly or, schedule
the post for later.  The current logic is:

  if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold
    post the send
  else
    schedule

This can allow large requests to call the send engine directly.  Large
requests can potentially produce a large number of packets prior to
returning to the caller, blocking the caller from posting more requests,
and allowing better parallel processing.

Allow the driver(s) more say in this logic (pass call_send to the driver,
rather than examining a return value).

Update hfi1/qib logic to schedule the send engine if an RC or UC message
is larger than the QP MTU size.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0b79b277

RDMA/mlx5: Add flow actions support to raw create flow · fa76d24e

由 Mark Bloch 提交于 9月 06, 2018

Support attaching flow actions to a flow rule via raw create flow.
For now only NIC RX path is supported. This change requires to export
flow resources management functions so we can maintain proper bookkeeping
of flow actions.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fa76d24e

RDMA/uverbs: Move flow resources initialization · 86e1d464

由 Mark Bloch 提交于 9月 06, 2018

Use ib_set_flow() when initializing flow related resources.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

86e1d464

IB/uverbs: Add IDRs array attribute type to ioctl() interface · 70cd20ae

由 Guy Levi 提交于 9月 06, 2018

Methods sometimes need to get a flexible set of IDRs and not a strict set
as can be achieved today by the conventional IDR attribute. Add a new
IDRS_ARRAY attribute to the generic uverbs ioctl layer.

IDRS_ARRAY points to array of idrs of the same object type and same access
rights, only write and read are supported.
Signed-off-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>``
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

70cd20ae

RDMA/core: Document QP @event_handler function · eb93c82e

由 Chuck Lever 提交于 9月 04, 2018

Add helpful warning for RDMA consumer implementers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

eb93c82e

RDMA/core: Document CM @event_handler function · 42690246

由 Chuck Lever 提交于 9月 04, 2018

Code audit suggests that the RDMA CM event handler callback function is
_always_ invoked in a context that is safe to block. That's important for
consumer implementers to know, so document that in the comment before
rdma_create_id (where the handler function is set up by the consumer).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

42690246

07 9月, 2018 1 次提交

RDMA/core: Define client_data_lock as rwlock instead of spinlock · e1f540c3

由 Parav Pandit 提交于 8月 28, 2018

Even though device registration/unregistration and client
registration/unregistration is not a performance path, define the
client_data_lock as rwlock for code clarity.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e1f540c3

06 9月, 2018 5 次提交

RDMA/core: Depend on device_add() to add device attributes · adee9f3f

由 Parav Pandit 提交于 9月 05, 2018

Instead of adding/removing device attribute files, depend on device_add()
which considers adding these device files based on NULL terminated
attributes group array.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

adee9f3f

RDMA/uverbs: Declare closing variable as boolean · 6ceb6331

由 Leon Romanovsky 提交于 9月 03, 2018

The "closing" variable is used as boolean and set to "true" in one
place, update the declaration of that variable and their other
assignment to proper type.

Fixes: e951747a ("IB/uverbs: Rework the locking for cleaning up the ucontext")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6ceb6331

IB/core: Add an unbound WQ type to the new CQ API · f794809a

由 Jack Morgenstein 提交于 8月 27, 2018

The upstream kernel commit cited below modified the workqueue in the
new CQ API to be bound to a specific CPU (instead of being unbound).
This caused ALL users of the new CQ API to use the same bound WQ.

Specifically, MAD handling was severely delayed when the CPU bound
to the WQ was busy handling (higher priority) interrupts.

This caused a delay in the MAD "heartbeat" response handling,
which resulted in ports being incorrectly classified as "down".

To fix this, add a new "unbound" WQ type to the new CQ API, so that users
have the option to choose either a bound WQ or an unbound WQ.

For MADs, choose the new "unbound" WQ.

Fixes: b7363e67 ("IB/device: Convert ib-comp-wq to be CPU-bound")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.m>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f794809a

RDMA/uverbs: Add generic function to fill in flow action object · 841eefc5

由 Mark Bloch 提交于 8月 28, 2018

Refactor the initialization of a flow action object to a common function.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

841eefc5

RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language · 0953fffe

由 Mark Bloch 提交于 8月 28, 2018

This makes it clear and safe to access constants passed in from user
space. We define a consistent ABI of u64 for all constants, and verify
that the data passed in can be represented by the type the user supplies.

The expectation is this will always be used with an enum declaring the
constant values, and the user will use the enum type as input to the
accessor.

To retrieve the attribute value we introduce two helper calls - one
standard which may fail if attribute is not valid and one where caller can
provide a default value which will be used in case the attribute is not
valid (useful when attribute is optional).
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

0953fffe

23 8月, 2018 1 次提交

mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7

由 Michal Hocko 提交于 8月 21, 2018

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: NDavid Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93065ac7

18 8月, 2018 1 次提交

RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr · b4c296f9

由 Jason Gunthorpe 提交于 8月 17, 2018

All RDMA ULPs should be using rdma_get_gid_attr instead of
ib_query_gid. Convert SMC to use the new API.

In the process correct some confusion with gid_type - if attr->ndev is
!NULL then gid_type can never be IB_GID_TYPE_IB by
definition. IB_GID_TYPE_ROCE shares the same enum value and is probably
what was intended here.
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b4c296f9

17 8月, 2018 1 次提交

Revert "net/smc: Replace ib_query_gid with rdma_get_gid_attr" · 92f4e77c

由 Jason Gunthorpe 提交于 8月 15, 2018

This reverts commit ddb457c6.

The include rdma/ib_cache.h is kept, and we have to add a memset
to the compat wrapper to avoid compiler warnings in gcc-7

This revert is done to avoid extensive merge conflicts with SMC
changes in netdev during the 4.19 merge window.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

92f4e77c

13 8月, 2018 3 次提交

J
IB/uverbs: Remove struct uverbs_root_spec and all supporting code · 51d0a2b4
由 Jason Gunthorpe 提交于 8月 09, 2018
```
Everything now uses the uverbs_uapi data structure.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
```
51d0a2b4

IB/uverbs: Use uverbs_api to unmarshal ioctl commands · 3a863577

由 Jason Gunthorpe 提交于 8月 09, 2018

Convert the ioctl method syscall path to use the uverbs_api data
structures. The new uapi structure includes all the same information, just
in a different and more optimal way.

 - Use attr_bkey instead of 2 level radix trees for everything related to
   attributes. This includes the attribute storage, presence, and
   detection of missing mandatory attributes.
 - Avoid iterating over all attribute storage at finish, instead use
   find_first_bit with the attr_bkey to locate only those attrs that need
   cleanup.
 - Organize things to always run, and always rely on, cleanup. This
   avoids a bunch of tricky error unwind cases.
 - Locate the method using the radix tree, and locate the attributes
   using a very efficient incremental radix tree lookup
 - Use the precomputed destroy_bkey to handle uobject destruction
 - Use the precomputed allocation sizes and precomputed 'need_stack'
   to avoid maths in the fast path. This is optimal if userspace
   does not pass (many) unsupported attributes.

Overall this results in much better codegen for the attribute accessors,
everything is now stored in bitmaps or linear arrays indexed by attr_bkey.
The compiler can compute attr_bkey values at compile time for all method
attributes, meaning things like uverbs_attr_is_valid() now compile into
single instruction bit tests.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3a863577

IB/uverbs: Add a simple allocator to uverbs_attr_bundle · 461bb2ee

由 Jason Gunthorpe 提交于 8月 09, 2018

This is similar in spirit to devm, it keeps track of any allocations
linked to this method call and ensures they are all freed when the method
exits. Further, if there is space in the internal/onstack buffer then the
allocator will hand out that memory and avoid an expensive call to
kalloc/kfree in the syscall path.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

461bb2ee

11 8月, 2018 5 次提交

IB/uverbs: Remove the ib_uverbs_attr pointer from each attr · 6a1f444f

由 Jason Gunthorpe 提交于 8月 09, 2018

Memory in the bundle is valuable, do not waste it holding an 8 byte
pointer for the rare case of writing to a PTR_OUT. We can compute the
pointer by storing a small 1 byte array offset and the base address of the
uattr memory in the bundle private memory.

This also means we can access the kernel's copy of the ib_uverbs_attr, so
drop the copy of flags as well.

Since the uattr base should be private bundle information this also
de-inlines the already too big uverbs_copy_to inline and moves
create_udata into uverbs_ioctl.c so they can see the private struct
definition.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

6a1f444f

IB/uverbs: Provide implementation private memory for the uverbs_attr_bundle · 4b3dd2bb

由 Jason Gunthorpe 提交于 8月 09, 2018

This already existed as the anonymous 'ctx' structure, but this was not
really a useful form. Hoist this struct into bundle_priv and rework the
internal things to use it instead.

Move a bunch of the processing internal state into the priv and reduce the
excessive use of function arguments.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

4b3dd2bb

IB/uverbs: Use uverbs_api to manage the object type inside the uobject · 6b0d08f4

由 Jason Gunthorpe 提交于 8月 09, 2018

Currently the struct uverbs_obj_type stored in the ib_uobject is part of
the .rodata segment of the module that defines the object. This is a
problem if drivers define new uapi objects as we will be left with a
dangling pointer after device disassociation.

Switch the uverbs_obj_type for struct uverbs_api_object, which is
allocated memory that is part of the uverbs_api and is guaranteed to
always exist. Further this moves the 'type_class' into this memory which
means access to the IDR/FD function pointers is also guaranteed. Drivers
cannot define new types.

This makes it safe to continue to use all uobjects, including driver
defined ones, after disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6b0d08f4

IB/uverbs: Build the specs into a radix tree at runtime · 9ed3e5f4

由 Jason Gunthorpe 提交于 8月 09, 2018

This radix tree datastructure is intended to replace the 'hash' structure
used today for parsing ioctl methods during system calls. This first
commit introduces the structure and builds it from the existing .rodata
descriptions.

The so-called hash arrangement is actually a 5 level open coded radix tree.
This new version uses a 3 level radix tree built using the radix tree
library.

Overall this is much less code and much easier to build as the radix tree
API allows for dynamic modification during the building. There is a small
memory penalty to pay for this, but since the radix tree is allocated on
a per device basis, a few kb of RAM seems immaterial considering the
gained simplicity.

The radix tree is similar to the existing tree, but also has a 'attr_bkey'
concept, which is a small value'd index for each method attribute. This is
used to simplify and improve performance of everything in the next
patches.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>

9ed3e5f4

IB/uverbs: Have the core code create the uverbs_root_spec · 7d96c9b1

由 Jason Gunthorpe 提交于 8月 09, 2018

There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>

7d96c9b1

03 8月, 2018 1 次提交

RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5

由 Jason Gunthorpe 提交于 7月 29, 2018

Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9f49a5b5

02 8月, 2018 7 次提交

IB/uverbs: Allow all DESTROY commands to succeed after disassociate · 0f50d88a

由 Jason Gunthorpe 提交于 7月 25, 2018

The disassociate function was broken by design because it failed all
commands. This prevents userspace from calling destroy on a uobject after
it has detected a device fatal error and thus reclaiming the resources in
userspace is prevented.

This fix is now straightforward, when anything destroys a uobject that is
not the user the object remains on the IDR with a NULL context and object
pointer. All lookup locking modes other than DESTROY will fail. When the
user ultimately calls the destroy function it is simply dropped from the
IDR while any related information is returned.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0f50d88a

IB/uverbs: Do not pass struct ib_device to the ioctl methods · e83f0ecd

由 Jason Gunthorpe 提交于 7月 25, 2018

This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e83f0ecd

IB/uverbs: Do not pass struct ib_device to the write based methods · bbd51e88

由 Jason Gunthorpe 提交于 7月 25, 2018

This is a step to get rid of the global check for disassociation. In this
model, the ib_dev is not proven to be valid by the core code and cannot be
provided to the method. Instead, every method decides if it is able to
run after disassociation and obtains the ib_dev using one of three
different approaches:

- Call srcu_dereference on the udevice's ib_dev. As before, this means
  the method cannot be called after disassociation begins.
  (eg alloc ucontext)
- Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext()
- Retrieve the ib_dev from the uobject->object after checking
  under SRCU if disassociation has started (eg uobj_get)

Largely, the code is all ready for this, the main work is to provide a
ib_dev after calling uobj_alloc(). The few other places simply use
ib_uverbs_get_ucontext() to get the ib_dev.

This flexibility will let the next patches allow destroy to operate
after disassociation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bbd51e88

IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate · 7452a3c7

由 Jason Gunthorpe 提交于 7月 25, 2018

After all the recent structural changes this is now straightfoward, hoist
the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around
the uobject write lock as well as the destroy.

This is necessary as obtaining a write lock concurrently with
uverbs_destroy_ufile_hw() will cause malfunction.

After this change none of the destroy callbacks require the
disassociate_srcu lock to be correct.

This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the
IOCTL interface needs to hold an unlocked kref until all command
verification is completed.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7452a3c7

IB/uverbs: Convert 'bool exclusive' into an enum · 9867f5c6

由 Jason Gunthorpe 提交于 7月 25, 2018

This is more readable, and future patches will need a 3rd lookup type.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9867f5c6

IB/uverbs: Consolidate uobject destruction · 87ad80ab

由 Jason Gunthorpe 提交于 7月 25, 2018

There are several flows that can destroy a uobject and each one is
minimized and sprinkled throughout the code base, making it difficult to
understand and very hard to modify the destroy path.

Consolidate all of these into uverbs_destroy_uobject() and call it in all
cases where a uobject has to be destroyed.

This makes one change to the lifecycle, during any abort (eg when
alloc_commit is not called) we always call out to alloc_abort, even if
remove_commit needs to be called to delete a HW object.

This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
clarify its actual usage and revises some of the comments to reflect what
the life cycle is for the type implementation.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

87ad80ab

IB/uverbs: Make the write path destroy methods use the same flow as ioctl · 32ed5c00

由 Jason Gunthorpe 提交于 7月 25, 2018

The ridiculous dance with uobj_remove_commit() is not needed, the write
path can follow the same flow as ioctl - lock and destroy the HW object
then use the data left over in the uobject to form the response to
userspace.

Two helpers are introduced to make this flow straightforward for the
caller.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

32ed5c00

31 7月, 2018 3 次提交

RDMA/core: Return bool instead of int · ca3a8ace

由 Parav Pandit 提交于 7月 29, 2018

Return bool for following internal and inline functions as their
underlying APIs return bool too.

1. cma_zero_addr()
2. cma_loopback_addr()
3. cma_any_addr()
4. ib_addr_any()
5. ib_addr_loopback()

While we are touching cma_loopback_addr(), remove extra white spaces
in it.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ca3a8ace

RDMA/cma: Constify path record, ib_cm_event, listen_id pointers · e7ff98ae

由 Parav Pandit 提交于 7月 29, 2018

Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e7ff98ae

RDMA/core: Constify dst_addr argument · 2df7dba8

由 Parav Pandit 提交于 7月 29, 2018

Following APIs are not supposed to modify addr or dest_addr contents.
Therefore make those function argument const for better code
readability.

1. rdma_resolve_ip()
2. rdma_addr_size()
3. rdma_resolve_addr()
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2df7dba8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功