提交 · 7f90a5a069f8dff9c76505b9853f95667d117c15 · openeuler / Kernel

21 5月, 2020 2 次提交

IB/{rdmavt, hfi1}: Implement creation of accelerated UD QPs · 7f90a5a0

由 Gary Leshner 提交于 5月 11, 2020

Adds capability to create a qpn to be recognized as an accelerated
UD QP for ipoib.

This is accomplished by reserving 0x81 in byte[0] of the qpn as the
prefix for these qp types and reserving qpns between 0x810000 and
0x81ffff.

The hfi1 capability mask already contained a flag for the VNIC netdev.
This has been renamed and extended to include both VNIC and ipoib.

The rvt code to allocate qps now recognizes this flag and sets 0x81
into byte[0] of the qpn.

The code to allocate qpns is modified to reset the qpn numbering when it
is detected that a value is located in byte[0] for a UD QP and it is a
qpn being requested for net dev use. If it is a regular UD QP then it is
allowable to have bits set in byte[0] of the qpn and provide the
previously normal behavior.

The code to free the qpn now checks for the AIP prefix value of 0x81 and
removes it from the qpn before being freed so that the lower 16 bit
number can be reused.

This patch requires minor changes in the IB core and ipoib to facilitate
the creation of accelerated UP QPs.

Link: https://lore.kernel.org/r/20200511160607.173205.11757.stgit@awfm-01.aw.intel.comReviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NGary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7f90a5a0

IB/hfi1: Add functions to transmit datagram ipoib packets · d99dc602

由 Gary Leshner 提交于 5月 11, 2020

This patch implements the mechanism to accelerate the transmit side of
a multiple transmit queue RDMA netdev by submitting the packets to
the SDMA engine directly instead of sending through the verbs layer.

This patch also changes the UD/SEND_ONLY op to output the entropy value
in byte 0 of deth[1]. UD/SEND_ONLY_WITH_IMMEDIATE uses the previous
behavior with no entropy value being output.

The code in the ipoib rdma netdev which submits tx requests upon
successful submission will call trace_sdma_output_ibhdr to output
the ibhdr to the trace buffer.

Link: https://lore.kernel.org/r/20200511160548.173205.45616.stgit@awfm-01.aw.intel.comReviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NGary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d99dc602

18 5月, 2020 1 次提交

RDMA/core: Consolidate ib_create_srq flows · b0810b03

由 Jason Gunthorpe 提交于 5月 06, 2020

The uverbs layer largely duplicate the code in ib_create_srq(), with the
slight difference that it passes in a udata. Move all the code together
into ib_create_srq_user() and provide an inline for kernel users, similar
to other create calls.

Link: https://lore.kernel.org/r/20200506082444.14502-6-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b0810b03

07 5月, 2020 1 次提交

RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP source port · d5665a21

由 Mark Zhang 提交于 5月 04, 2020

Add two hash functions to distribute RoCE v2 UDP source and Flowlabel
symmetrically. These are user visible API and any change in the
implementation needs to be tested for inter-operability between old and
new variant.

Link: https://lore.kernel.org/r/20200504051935.269708-2-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d5665a21

06 5月, 2020 1 次提交

RDMA: Allow ib_client's to fail when add() is called · 11a0ae4c

由 Jason Gunthorpe 提交于 4月 21, 2020

When a client is added it isn't allowed to fail, but all the client's have
various failure paths within their add routines.

This creates the very fringe condition where the client was added, failed
during add and didn't set the client_data. The core code will then still
call other client_data centric ops like remove(), rename(), get_nl_info(),
and get_net_dev_by_params() with NULL client_data - which is confusing and
unexpected.

If the add() callback fails, then do not call any more client ops for the
device, even remove.

Remove all the now redundant checks for NULL client_data in ops callbacks.

Update all the add() callbacks to return error codes
appropriately. EOPNOTSUPP is used for cases where the ULP does not support
the ib_device - eg because it only works with IB.

Link: https://lore.kernel.org/r/20200421172440.387069-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Acked-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

11a0ae4c

03 5月, 2020 3 次提交

RDMA/core: Get xmit slave for LAG · 51aab126

由 Maor Gottlieb 提交于 4月 30, 2020

Add a call to rdma_lag_get_ah_roce_slave() when the address handle is
created. Lower driver can use it to select the QP's affinity port.

Link: https://lore.kernel.org/r/20200430192146.12863-15-maorg@mellanox.comSigned-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

51aab126

RDMA/core: Add LAG functionality · bd3920ea

由 Maor Gottlieb 提交于 4月 30, 2020

Add support to get the RoCE LAG xmit slave by building skb of the RoCE
packet and call to master_get_xmit_slave. If driver wants to get the
slave assume all slaves are available, then need to set
RDMA_LAG_FLAGS_HASH_ALL_SLAVES in flags.

Link: https://lore.kernel.org/r/20200430192146.12863-14-maorg@mellanox.comSigned-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bd3920ea

RDMA: Group create AH arguments in struct · fa5d010c

由 Maor Gottlieb 提交于 4月 30, 2020

Following patch adds additional argument to the create AH function, so it
make sense to group ah_attr and flags arguments in struct.

Link: https://lore.kernel.org/r/20200430192146.12863-13-maorg@mellanox.comSigned-off-by: NMaor Gottlieb <maorg@mellanox.com>
Acked-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Acked-by: NGal Pressman <galpress@amazon.com>
Acked-by: NWeihang Li <liweihang@huawei.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fa5d010c

21 2月, 2020 1 次提交

RDMA: Replace zero-length array with flexible-array member · 5b361328

由 Gustavo A. R. Silva 提交于 2月 12, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")

Link: https://lore.kernel.org/r/20200213010425.GA13068@embeddedor.comSigned-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> # added a few more

5b361328

19 2月, 2020 1 次提交

RDMA/core: Get rid of ib_create_qp_user · b72bfc96

由 Jason Gunthorpe 提交于 2月 13, 2020

This function accepts a udata but does nothing with it, and is never
passed a !NULL udata. Rename it to ib_create_qp which was the only caller
and remove the udata.

Link: https://lore.kernel.org/r/20200213191911.GA9898@ziepe.caReviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b72bfc96

17 1月, 2020 3 次提交

RDMA/uverbs: Add new relaxed ordering memory region access flag · 2233c660

由 Michael Guralnik 提交于 1月 08, 2020

Add a new relaxed ordering access flag for memory regions. Using memory
regions with relaxed ordeing set can enhance performance.

This access flag is handled in a best-effort manner, drivers should ignore
if they don't support setting relaxed ordering.

Link: https://lore.kernel.org/r/1578506740-22188-9-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2233c660

RDMA/core: Add optional access flags range · 68d384b9

由 Michael Guralnik 提交于 1月 08, 2020

Define a range of access flags that are defined to be optional, both
uverbs and drivers should enable getting them and use if they are
applicable

This will be used, for example, for the relaxed ordering access flag which
unsupporting drivers can ignore.

Link: https://lore.kernel.org/r/1578506740-22188-7-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

68d384b9

RDMA/uverbs: Verify MR access flags · ca95c141

由 Michael Guralnik 提交于 1月 08, 2020

Verify that MR access flags that are passed from user are all supported
ones, otherwise an error is returned.

Fixes: 4fca0377 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi")
Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.comSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ca95c141

16 1月, 2020 2 次提交

IB/core: Add interface to advise_mr for kernel users · 87d8069f

由 Moni Shoua 提交于 1月 15, 2020

Allow ULPs to call advise_mr, so they can control ODP regions
in the same way as user space applications.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

87d8069f

IB/core: Introduce ib_reg_user_mr · 33006bd4

由 Moni Shoua 提交于 1月 15, 2020

Add ib_reg_user_mr() for kernel ULPs to register user MRs.

The common use case that uses this function is a userspace application
that allocates memory for HCA access but the responsibility to register
the memory at the HCA is on an kernel ULP. This ULP that acts as an agent
for the userspace application.

This function is intended to be used without a user context so vendor
drivers need to be aware of calling reg_user_mr() device operation with
udata equal to NULL.

Among all drivers, i40iw is the only driver which relies on presence
of udata, so check udata existence for that driver.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

33006bd4

14 1月, 2020 4 次提交

RDMA/core: Do not erase the type of ib_wq.uobject · e04dd131

由 Jason Gunthorpe 提交于 1月 08, 2020

This is a struct ib_uwq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e04dd131

RDMA/core: Do not erase the type of ib_srq.uobject · 9fbe334c

由 Jason Gunthorpe 提交于 1月 08, 2020

This is a struct ib_usrq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9fbe334c

RDMA/core: Do not erase the type of ib_qp.uobject · 620d3f81

由 Jason Gunthorpe 提交于 1月 08, 2020

This is a struct ib_uqp_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

620d3f81

RDMA/core: Do not erase the type of ib_cq.uobject · 5bd48c18

由 Jason Gunthorpe 提交于 1月 08, 2020

This is a struct ib_ucq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.comSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5bd48c18

08 1月, 2020 4 次提交

IB/core: Rename event_handler_lock to qp_open_list_lock · 40adf686

由 Parav Pandit 提交于 12月 12, 2019

This lock is used to protect the qp->open_list linked list. As a side
effect it seems to also globally serialize the qp event_handler, but it
isn't clear if that is a deliberate design.

Link: https://lore.kernel.org/r/20191212113024.336702-5-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

40adf686

IB/core: Cut down single member ib_cache structure · 17e10646

由 Parav Pandit 提交于 12月 12, 2019

Given that ib_cache structure has only single member now, merge the cache
lock directly in the ib_device.

Link: https://lore.kernel.org/r/20191212113024.336702-4-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

17e10646

IB/core: Let IB core distribute cache update events · 6b57cea9

由 Parav Pandit 提交于 12月 12, 2019

Currently when the low level driver notifies Pkey, GID, and port change
events they are notified to the registered handlers in the order they are
registered.

IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.

Since all GID queries done by ULPs are serviced by IB core, and the IB
core deferes cache updates to a work queue, it is possible for other
clients to see stale cache data when they handle their own events.

For example, the below call tree shows how ipoib will call
rdma_query_gid() concurrently with the update to the cache sitting in the
WQ.

mlx5_ib_handle_event()
  ib_dispatch_event()
    ib_cache_event()
       queue_work() -> slow cache update

    [..]
    ipoib_event()
     queue_work()
       [..]
       work handler
         ipoib_ib_dev_flush_light()
           __ipoib_ib_dev_flush()
              ipoib_dev_addr_changed_valid()
                rdma_query_gid() <- Returns old GID, cache not updated.

Move all the event dispatch to a work queue so that the cache update is
always done before any clients are notified.

Fixes: f35faa4b ("IB/core: Simplify ib_query_gid to always refer to cache")
Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6b57cea9

RDMA/core: Trace points for diagnosing completion queue issues · 3e5901cb

由 Chuck Lever 提交于 12月 18, 2019

Sample trace events:

   kworker/u29:0-300   [007]   120.042217: cq_alloc:             cq.id=4 nr_cqe=161 comp_vector=2 poll_ctx=WORKQUEUE
          <idle>-0     [002]   120.056292: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.056402: cq_process:           cq.id=4 wake-up took 109 [us] from interrupt
    kworker/2:1H-482   [002]   120.056407: cq_poll:              cq.id=4 requested 16, returned 1
          <idle>-0     [002]   120.067503: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.067537: cq_process:           cq.id=4 wake-up took 34 [us] from interrupt
    kworker/2:1H-482   [002]   120.067541: cq_poll:              cq.id=4 requested 16, returned 1
          <idle>-0     [002]   120.067657: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.067672: cq_process:           cq.id=4 wake-up took 15 [us] from interrupt
    kworker/2:1H-482   [002]   120.067674: cq_poll:              cq.id=4 requested 16, returned 1

 ...

         systemd-1     [002]   122.392653: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   122.392688: cq_process:           cq.id=4 wake-up took 35 [us] from interrupt
    kworker/2:1H-482   [002]   122.392693: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.392836: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.392970: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.393083: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.393195: cq_poll:              cq.id=4 requested 16, returned 3

Several features to note in this output:
 - The WCE count and context type are reported at allocation time
 - The CPU and kworker for each CQ is evident
 - The CQ's restracker ID is tagged on each trace event
 - CQ poll scheduling latency is measured
 - Details about how often single completions occur versus multiple
   completions are evident
 - The cost of the ULP's completion handler is recorded

Link: https://lore.kernel.org/r/20191218201815.30584.3481.stgit@manet.1015granger.netSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3e5901cb

13 12月, 2019 1 次提交

IB/core: Introduce rdma_user_mmap_entry_insert_range() API · 7a763d18

由 Yishai Hadas 提交于 12月 12, 2019

Introduce rdma_user_mmap_entry_insert_range() API to be used once the
required key for the given entry should be in a given range.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

7a763d18

24 11月, 2019 1 次提交

RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e

由 Jason Gunthorpe 提交于 11月 12, 2019

Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:

 zap_page_range()
  mmu_notifier_invalidate_range_start()
   [..]
    ib_umem_notifier_invalidate_range_start()
       down_read(&per_mm->umem_rwsem)
  unmap_single_vma()
    [..]
      __split_huge_page_pmd()
        mmu_notifier_invalidate_range_start()
        [..]
           ib_umem_notifier_invalidate_range_start()
              down_read(&per_mm->umem_rwsem)   // DEADLOCK

        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
  mmu_notifier_invalidate_range_end()
     up_read(&per_mm->umem_rwsem)

The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.

Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f25a546e

23 11月, 2019 1 次提交

IB/core: Add interfaces to get VF node and port GUIDs · bfcb3c5d

由 Danit Goldberg 提交于 11月 06, 2019

Provide ability to get node and port GUIDs of VFs to be symmetrical
to already existing set option.
Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

bfcb3c5d

13 11月, 2019 1 次提交

RDMA: Change MAD processing function to remove extra casting and parameter · e26e7b88

由 Leon Romanovsky 提交于 10月 29, 2019

All users of process_mad() converts input pointers from ib_mad_hdr to be
ib_mad, update the function declaration to use ib_mad directly.

Also remove not used input MAD size parameter.

Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Tested-By: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e26e7b88

07 11月, 2019 2 次提交

RDMA: Connect between the mmap entry and the umap_priv structure · c043ff2c

由 Michal Kalderon 提交于 10月 30, 2019

The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped. The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before. Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.comSigned-off-by: NAriel Elior <ariel.elior@marvell.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c043ff2c

RDMA/core: Create mmap database and cookie helper functions · 3411f9f0

由 Michal Kalderon 提交于 10月 30, 2019

Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.comSigned-off-by: NAriel Elior <ariel.elior@marvell.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3411f9f0

29 10月, 2019 1 次提交

RDMA/core: Fix ib_dma_max_seg_size() · ecdfdfdb

由 Bart Van Assche 提交于 10月 25, 2019

If dev->dma_device->params == NULL then the maximum DMA segment size is 64
KB. See also the dma_get_max_seg_size() implementation. This patch fixes
the following kernel warning:

  DMA-API: infiniband rxe0: mapping sg segment longer than device claims to support [len=126976] [max=65536]
  WARNING: CPU: 4 PID: 4848 at kernel/dma/debug.c:1220 debug_dma_map_sg+0x3d9/0x450
  RIP: 0010:debug_dma_map_sg+0x3d9/0x450
  Call Trace:
   srp_queuecommand+0x626/0x18d0 [ib_srp]
   scsi_queue_rq+0xd02/0x13e0 [scsi_mod]
   __blk_mq_try_issue_directly+0x2b3/0x3f0
   blk_mq_request_issue_directly+0xac/0xf0
   blk_insert_cloned_request+0xdf/0x170
   dm_mq_queue_rq+0x43d/0x830 [dm_mod]
   __blk_mq_try_issue_directly+0x2b3/0x3f0
   blk_mq_request_issue_directly+0xac/0xf0
   blk_mq_try_issue_list_directly+0xb8/0x170
   blk_mq_sched_insert_requests+0x23c/0x3b0
   blk_mq_flush_plug_list+0x529/0x730
   blk_flush_plug_list+0x21f/0x260
   blk_mq_make_request+0x56b/0xf20
   generic_make_request+0x196/0x660
   submit_bio+0xae/0x290
   blkdev_direct_IO+0x822/0x900
   generic_file_direct_write+0x110/0x200
   __generic_file_write_iter+0x124/0x2a0
   blkdev_write_iter+0x168/0x270
   aio_write+0x1c4/0x310
   io_submit_one+0x971/0x1390
   __x64_sys_io_submit+0x12a/0x390
   do_syscall_64+0x6f/0x2e0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: https://lore.kernel.org/r/20191025225830.257535-2-bvanassche@acm.org
Cc: <stable@vger.kernel.org>
Fixes: 0b5cb330 ("RDMA/srp: Increase max_segment_size")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ecdfdfdb

23 10月, 2019 4 次提交

RDMA/nldev: Provide MR statistics · 4061ff7a

由 Erez Alfasi 提交于 10月 16, 2019

Add RDMA nldev netlink interface for dumping MR statistics information.

Output example:

$ ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

$ rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.orgSigned-off-by: NErez Alfasi <ereza@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4061ff7a

IB/mlx5: Introduce ODP diagnostic counters · a3de94e3

由 Erez Alfasi 提交于 10月 16, 2019

Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.

Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.orgSigned-off-by: NErez Alfasi <ereza@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a3de94e3

RDMA/uverbs: Prevent potential underflow · a9018adf

由 Dan Carpenter 提交于 10月 11, 2019

The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that:

if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {

But we don't check if "attr.comp_vector" is negative. It could
potentially lead to an array underflow. My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.

And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.

Fixes: 9ee79fce ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwandaSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a9018adf

RDMA/rw: Support threshold for registration vs scattering to local pages · 00bd1439

由 Yamin Friedman 提交于 10月 07, 2019

If there are more scatter entries than the recommended limit provided by
the ib device, UMR registration is used. This will provide optimal
performance when performing large RDMA READs over devices that advertise
the threshold capability.

With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes:
Without use of cap: 70Gb/sec
With use of cap: 84Gb/sec

Link: https://lore.kernel.org/r/20191007135933.12483-3-leon@kernel.orgSigned-off-by: NYamin Friedman <yaminf@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

00bd1439

22 8月, 2019 2 次提交

RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' · c571feca

由 Jason Gunthorpe 提交于 8月 06, 2019

This is a significant simplification, no extra list is kept per FD, and
the interval tree is now shared between all the ucontexts, reducing
overhead if there are multiple ucontexts active.

Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.caSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>

c571feca

RDMA/core: Make invalidate_range a device operation · ce51346f

由 Moni Shoua 提交于 8月 19, 2019

The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ce51346f

21 8月, 2019 1 次提交

RDMA: Delete DEBUG code · b2299e83

由 Leon Romanovsky 提交于 8月 19, 2019

There is no need to keep DEBUG defines for out-of-the tree testing.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819114547.20704-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

b2299e83

12 8月, 2019 1 次提交

RDMA: Introduce ib_port_phys_state enum · 72a7720f

由 Kamal Heib 提交于 8月 07, 2019

In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NAndrew Boyer <aboyer@tobark.org>
Acked-by: NMichal Kalderon <michal.kalderon@marvell.com>
Acked-by: NBernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

72a7720f

06 8月, 2019 1 次提交

RDMA/core: Introduce ratelimited ibdev printk functions · 05bb411a

由 Gal Pressman 提交于 8月 01, 2019

Add ratelimited helpers to the ibdev_* printk functions.
Implementation inspired by counterpart dev_*_ratelimited functions.
Signed-off-by: NGal Pressman <galpress@amazon.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190801171447.54440-2-galpress@amazon.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

05bb411a

05 8月, 2019 1 次提交

rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors · 20cf4e02

由 Chuck Lever 提交于 7月 29, 2019

Send and Receive completion is handled on a single CPU selected at
the time each Completion Queue is allocated. Typically this is when
an initiator instantiates an RDMA transport, or when a target
accepts an RDMA connection.

Some ULPs cannot open a connection per CPU to spread completion
workload across available CPUs and MSI vectors. For such ULPs,
provide an API that allows the RDMA core to select a completion
vector based on the device's complement of available comp_vecs.

ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted
to use the new API so that their completion workloads interfere less
with each other.
Suggested-by: NHåkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Cc: <linux-cifs@vger.kernel.org>
Cc: <v9fs-developer@lists.sourceforge.net>
Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.netSigned-off-by: NDoug Ledford <dledford@redhat.com>

20cf4e02

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功