提交 · 909624d8db5bbd369690749eb4c4392766f39f94 · openeuler / Kernel

05 10月, 2019 1 次提交

IB/cm: Use container_of() instead of typecast · 909624d8

由 Parav Pandit 提交于 10月 02, 2019

Use container_of() macro to get to timewait info structure instead of
typecasting.

Link: https://lore.kernel.org/r/20191002122517.17721-5-leon@kernel.orgSigned-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

909624d8

02 10月, 2019 1 次提交

RDMA/core: Fix return code when modify_device isn't supported · d0f3ef36

由 Kamal Heib 提交于 9月 23, 2019

The proper return code is "-EOPNOTSUPP" when modify_device callback is not
supported.

Link: https://lore.kernel.org/r/20190923104158.5331-2-kamalheib1@gmail.comSigned-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d0f3ef36

01 10月, 2019 1 次提交

RDMA/counter: Prevent QP counter manual binding in auto mode · 663912a6

由 Mark Zhang 提交于 9月 16, 2019

If auto mode is configured, manual counter allocation and QP bind is not
allowed.

Fixes: 1bd8e0a9 ("RDMA/counter: Allow manual mode configuration support")
Link: https://lore.kernel.org/r/20190916071154.20383-3-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

663912a6

25 9月, 2019 1 次提交

mm/gup: add make_dirty arg to put_user_pages_dirty_lock() · 2d15eb31

由 akpm@linux-foundation.org 提交于 9月 23, 2019

[11~From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
v3.

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've had
  to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set.  That
  can only be applied after all call sites are converted, but it's good to
  get an early look at it.

This is part a tree-wide conversion, as described in fc1d8e7c ("mm:
introduce put_user_page*(), placeholder versions").

This patch (of 3):

Provide more capable variation of put_user_pages_dirty_lock(), and delete
put_user_pages_dirty().  This is based on the following:

1.  Lots of call sites become simpler if a bool is passed into
   put_user_page*(), instead of making the call site choose which
   put_user_page*() variant to call.

2.  Christoph Hellwig's observation that set_page_dirty_lock() is
   usually correct, and set_page_dirty() is usually a bug, or at least
   questionable, within a put_user_page*() calling chain.

This leads to the following API choices:

    * put_user_pages_dirty_lock(page, npages, make_dirty)

    * There is no put_user_pages_dirty(). You have to
      hand code that, in the rare case that it's
      required.

[jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
  Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d15eb31

21 9月, 2019 1 次提交

ipv4: Revert removal of rt_uses_gateway · 77d5bc7e

由 David Ahern 提交于 9月 17, 2019

Julian noted that rt_uses_gateway has a more subtle use than 'is gateway
set':
    https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/

Revert that part of the commit referenced in the Fixes tag.

Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole
in the second cacheline which contains the gateway declaration. So move
rt_gw_family down to the gateway declarations since they are always used
together, and then re-use that u8 for rt_uses_gateway. End result is that
rtable size is unchanged.

Fixes: 1550c171 ("ipv4: Prepare rtable for IPv6 gateway")
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

77d5bc7e

17 9月, 2019 1 次提交

RDMA: Fix double-free in srq creation error flow · 3eca7fc2

由 Jack Morgenstein 提交于 9月 16, 2019

The cited commit introduced a double-free of the srq buffer in the error
flow of procedure __uverbs_create_xsrq().

The problem is that ib_destroy_srq_user() called in the error flow also
frees the srq buffer.

Thus, if uverbs_response() fails in __uverbs_create_srq(), the srq buffer
will be freed twice.

Cc: <stable@vger.kernel.org>
Fixes: 68e326de ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20190916071154.20383-5-leon@kernel.orgSigned-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3eca7fc2

14 9月, 2019 2 次提交

RDMA/odp: Add missing cast for 32 bit · b97b218b

由 Jason Gunthorpe 提交于 9月 08, 2019

length is a size_t which is unsigned int on 32 bit:

../drivers/infiniband/core/umem_odp.c: In function 'ib_init_umem_odp':
../include/linux/overflow.h:59:15: warning: comparison of distinct pointer types lacks a cast
   59 |  (void) (&__a == &__b);   \
      |               ^~
../drivers/infiniband/core/umem_odp.c:220:7: note: in expansion of macro 'check_add_overflow'

Fixes: 204e3e56 ("RDMA/odp: Check for overflow when computing the umem_odp end")
Link: https://lore.kernel.org/r/20190908080726.30017-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b97b218b

RDMA/cma: Fix false error message · a6e4d254

由 Håkon Bugge 提交于 9月 02, 2019

In addr_handler(), assuming status == 0 and the device already has been
acquired (id_priv->cma_dev != NULL), we get the following incorrect
"error" message:

RDMA CM: ADDR_ERROR: failed to resolve IP. status 0

Fixes: 498683c6 ("IB/cma: Add debug messages to error flows")
Link: https://lore.kernel.org/r/20190902092731.1055757-1-haakon.bugge@oracle.comSigned-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a6e4d254

28 8月, 2019 1 次提交

RDMA/iwpm: Delete unnecessary checks before the macro call "dev_kfree_skb" · fd1a52f3

由 Markus Elfring 提交于 8月 21, 2019

The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.

This issue was detected by using the Coccinelle software.

Link: https://lore.kernel.org/r/16df4c50-1f61-d7c4-3fc8-3073666d281d@web.deSigned-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd1a52f3

22 8月, 2019 12 次提交

RDMA/odp: remove ib_ucontext from ib_umem · 47f725ee

由 Jason Gunthorpe 提交于 8月 06, 2019

At this point the ucontext is only being stored to access the ib_device,
so just store the ib_device directly instead. This is more natural and
logical as the umem has nothing to do with the ucontext.

Link: https://lore.kernel.org/r/20190806231548.25242-8-jgg@ziepe.caSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>

47f725ee

RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' · c571feca

由 Jason Gunthorpe 提交于 8月 06, 2019

This is a significant simplification, no extra list is kept per FD, and
the interval tree is now shared between all the ucontexts, reducing
overhead if there are multiple ucontexts active.

Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.caSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>

c571feca

RDMA/core: Make invalidate_range a device operation · ce51346f

由 Moni Shoua 提交于 8月 19, 2019

The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NGuy Levi <guyle@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

ce51346f

RDMA/odp: Use kvcalloc for the dma_list and page_list · 37824952

由 Jason Gunthorpe 提交于 8月 19, 2019

There is no specific need for these to be in the valloc space, let the
system decide automatically how to do the allocation.

Link: https://lore.kernel.org/r/20190819111710.18440-10-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

37824952

RDMA/odp: Check for overflow when computing the umem_odp end · 204e3e56

由 Jason Gunthorpe 提交于 8月 19, 2019

Since the page size can be extended in the ODP case by IB_ACCESS_HUGETLB
the existing overflow checks done by ib_umem_get() are not
sufficient. Check for overflow again.

Further, remove the unchecked math from the inlines and just use the
precomputed value stored in the interval_tree_node.

Link: https://lore.kernel.org/r/20190819111710.18440-9-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

204e3e56

RDMA/odp: Provide ib_umem_odp_release() to undo the allocs · 0446cad9

由 Jason Gunthorpe 提交于 8月 19, 2019

Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.

Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0446cad9

RDMA/odp: Split creating a umem_odp from ib_umem_get · 261dc53f

由 Jason Gunthorpe 提交于 8月 19, 2019

This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.

Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

261dc53f

RDMA/odp: Make the three ways to create a umem_odp clear · f20bef6a

由 Jason Gunthorpe 提交于 8月 19, 2019

The three paths to build the umem_odps are kind of muddled, they are:
- As a normal ib_mr umem
- As a child in an implicit ODP umem tree
- As the root of an implicit ODP umem tree

Only the first two are actually umem's, the last is an abuse.

The implicit case can only be triggered by explicit driver request, it
should never be co-mingled with the normal case. While we are here, make
sensible function names and add some comments to make this clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f20bef6a

RMDA/odp: Consolidate umem_odp initialization · 22d79c9a

由 Jason Gunthorpe 提交于 8月 19, 2019

This is done in two different places, consolidate all the post-allocation
initialization into a single function.

Link: https://lore.kernel.org/r/20190819111710.18440-5-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

22d79c9a

RDMA/odp: Make it clearer when a umem is an implicit ODP umem · fd7dbf03

由 Jason Gunthorpe 提交于 8月 19, 2019

Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.

Instead of trying to guess this based on a zero length use an explicit
flag.

Further, do not allow non-implicit umems to be 0 size.

Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fd7dbf03

RDMA/odp: Iterate over the whole rbtree directly · f993de88

由 Jason Gunthorpe 提交于 8月 19, 2019

Instead of intersecting a full interval, just iterate over every element
directly. This is faster and clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-3-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f993de88

RDMA/odp: Use the common interval tree library instead of generic · 7cc2e18f

由 Jason Gunthorpe 提交于 8月 19, 2019

ODP is working with userspace VA's in the interval tree which always fit
into an unsigned long, so we can use the common code.

This comes at a cost of a 16 byte increase in ib_umem_odp struct size due
to storing the interval tree start/last in addition to the umem
addr/length. However these values were computed and are performance
critical for the interval lookup, so this seems like a worthwhile trade
off.

Removes 2k of .text from the kernel.

Link: https://lore.kernel.org/r/20190819111710.18440-2-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7cc2e18f

21 8月, 2019 6 次提交

RDMA/cma: fix null-ptr-deref Read in cma_cleanup · a7bfb93f

由 zhengbin 提交于 8月 19, 2019

In cma_init, if cma_configfs_init fails, need to free the
previously memory and return fail, otherwise will trigger
null-ptr-deref Read in cma_cleanup.

cma_cleanup
  cma_configfs_exit
    configfs_unregister_subsystem

Fixes: 045959db ("IB/cma: Add configfs for rdma_cm")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huawei.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

a7bfb93f

RDMA/restrack: Rewrite PID namespace check to be reliable · 60c78668

由 Leon Romanovsky 提交于 8月 15, 2019

task_active_pid_ns() is wrong API to check PID namespace because it
posses some restrictions and return PID namespace where the process
was allocated. It created mismatches with current namespace, which
can be different.

Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable
results without any relation to allocated PID namespace.

Fixes: 8be565e6 ("RDMA/nldev: Factor out the PID namespace check")
Fixes: 6a6c306a ("RDMA/restrack: Make is_visible_in_pid_ns() as an API")
Reviewed-by: NMark Zhang <markz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

60c78668

RDMA/counters: Properly implement PID checks · c8b32408

由 Leon Romanovsky 提交于 8月 15, 2019

"Auto" configuration mode is called for visible in that PID
namespace and it ensures that all counters and QPs are coexist
in the same namespace and belong to same PID.

Fixes: 99fa331d ("RDMA/counter: Add "auto" configuration mode support")
Reviewed-by: NMark Zhang <markz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-3-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

c8b32408

IB/core: Fix NULL pointer dereference when bind QP to counter · 948a7287

由 Ido Kalir 提交于 8月 15, 2019

If QP is not visible to the pid, then we try to decrease its reference
count and return from the function before the QP pointer is
initialized. This lead to NULL pointer dereference.
Fix it by pass directly the res to the rdma_restract_put as arg instead of
&qp->res.

This fixes below call trace:
[ 5845.110329] BUG: kernel NULL pointer dereference, address:
00000000000000dc
[ 5845.120482] Oops: 0002 [#1] SMP PTI
[ 5845.129119] RIP: 0010:rdma_restrack_put+0x5/0x30 [ib_core]
[ 5845.169450] Call Trace:
[ 5845.170544]  rdma_counter_get_qp+0x5c/0x70 [ib_core]
[ 5845.172074]  rdma_counter_bind_qpn_alloc+0x6f/0x1a0 [ib_core]
[ 5845.173731]  nldev_stat_set_doit+0x314/0x330 [ib_core]
[ 5845.175279]  rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
[ 5845.176772]  ? __kmalloc_node_track_caller+0x20b/0x2b0
[ 5845.178321]  rdma_nl_rcv+0xcb/0x120 [ib_core]
[ 5845.179753]  netlink_unicast+0x179/0x220
[ 5845.181066]  netlink_sendmsg+0x2d8/0x3d0
[ 5845.182338]  sock_sendmsg+0x30/0x40
[ 5845.183544]  __sys_sendto+0xdc/0x160
[ 5845.184832]  ? syscall_trace_enter+0x1f8/0x2e0
[ 5845.186209]  ? __audit_syscall_exit+0x1d9/0x280
[ 5845.187584]  __x64_sys_sendto+0x24/0x30
[ 5845.188867]  do_syscall_64+0x48/0x120
[ 5845.190097]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1bd8e0a9 ("RDMA/counter: Allow manual mode configuration support")
Signed-off-by: NIdo Kalir <idok@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

948a7287

RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB · 27b7fb1a

由 Jason Gunthorpe 提交于 8月 15, 2019

When ODP is enabled with IB_ACCESS_HUGETLB then the required pages
should be calculated based on the extent of the MR, which is rounded
to the nearest huge page alignment.

Fixes: d2183c6f ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-5-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

27b7fb1a

RDMA: Delete DEBUG code · b2299e83

由 Leon Romanovsky 提交于 8月 19, 2019

There is no need to keep DEBUG defines for out-of-the tree testing.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819114547.20704-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

b2299e83

16 8月, 2019 1 次提交

PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg() · 7f73eac3

由 Logan Gunthorpe 提交于 8月 12, 2019

Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg().

This is a prep patch to introduce correct mappings for p2pdma transactions
that go through the root complex.

Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.comSigned-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f73eac3

12 8月, 2019 3 次提交

RDMA/core: Fix error code in stat_get_doit_qp() · 932727c5

由 Dan Carpenter 提交于 8月 09, 2019

We need to set the error codes on these paths. Currently the only
possible error code is -EMSGSIZE so that's what the patch uses.

Fixes: 83c2c1fc ("RDMA/nldev: Allow get counter mode through RDMA netlink")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190809101311.GA17867@mwandaSigned-off-by: NDoug Ledford <dledford@redhat.com>

932727c5

RDMA/core: Add common iWARP query port · 4929116b

由 Kamal Heib 提交于 8月 07, 2019

Add support for a common iWARP query port function, the new function
includes a common code that is used by the iWARP devices to update the
port attributes like max_mtu, active_mtu, state, and phys_state, the
function also includes a call for the driver-specific query_port callback
to query the device-specific port attributes.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Link: https://lore.kernel.org/r/20190807103138.17219-4-kamalheib1@gmail.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

4929116b

RDMA: Introduce ib_port_phys_state enum · 72a7720f

由 Kamal Heib 提交于 8月 07, 2019

In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.
Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
Reviewed-by: NAndrew Boyer <aboyer@tobark.org>
Acked-by: NMichal Kalderon <michal.kalderon@marvell.com>
Acked-by: NBernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

72a7720f

08 8月, 2019 2 次提交

RDMA/counter: Prevent QP counter binding if counters unsupported · d97de888

由 Mark Zhang 提交于 8月 07, 2019

In case of rdma_counter_init() fails, counter allocation and QP bind
should not be allowed.

Fixes: 413d3347 ("RDMA/counter: Add set/clear per-port auto mode support")
Fixes: 1bd8e0a9 ("RDMA/counter: Allow manual mode configuration support")
Signed-off-by: NMark Zhang <markz@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190807101819.7581-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

d97de888

IB/mlx5: Fix implicit MR release flow · f591822c

由 Yishai Hadas 提交于 8月 05, 2019

Once implicit MR is being called to be released by
ib_umem_notifier_release() its leaves were marked as "dying".

However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
when those leaves were marked as "dying".

As such ib_umem_release() for the leaves won't be called and their MRs
will be leaked as well.

When an application exits/killed without calling dereg_mr we might hit the
above flow.

This fatal scenario is reported by WARN_ON() upon
mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
call trace can be seen below.

Originally the "dying" mark as part of ib_umem_notifier_release() was
introduced to prevent pagefault_mr() from returning a success response
once this happened. However, we already have today the completion
mechanism so no need for that in those flows any more.  Even in case a
success response will be returned the firmware will not find the pages and
an error will be returned in the following call as a released mm will
cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().

Fix the above issue by dropping the "dying" from the above flows.  The
other flows that are using "dying" are still needed it for their
synchronization purposes.

   WARNING: CPU: 1 PID: 7218 at
   drivers/infiniband/hw/mlx5/main.c:2004
		  mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
   CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
	       5.2.0-rc6+ #13
   Call Trace:
   uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
   ib_uverbs_close+0x1f/0x80 [ib_uverbs]
   __fput+0xbe/0x250
   task_work_run+0x88/0xa0
   do_exit+0x2cb/0xc30
   ? __fput+0x14b/0x250
   do_group_exit+0x39/0xb0
   get_signal+0x191/0x920
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_csk_accept+0x229/0x2f0
   do_signal+0x36/0x5e0
   ? put_unused_fd+0x5b/0x70
   ? __sys_accept4+0x1a6/0x1e0
   ? inet_hash+0x35/0x40
   ? release_sock+0x43/0x90
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_listen+0x9f/0x120
   exit_to_usermode_loop+0x5c/0xc6
   do_syscall_64+0x182/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f591822c

05 8月, 2019 1 次提交

rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors · 20cf4e02

由 Chuck Lever 提交于 7月 29, 2019

Send and Receive completion is handled on a single CPU selected at
the time each Completion Queue is allocated. Typically this is when
an initiator instantiates an RDMA transport, or when a target
accepts an RDMA connection.

Some ULPs cannot open a connection per CPU to spread completion
workload across available CPUs and MSI vectors. For such ULPs,
provide an API that allows the RDMA core to select a completion
vector based on the device's complement of available comp_vecs.

ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted
to use the new API so that their completion workloads interfere less
with each other.
Suggested-by: NHåkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Cc: <linux-cifs@vger.kernel.org>
Cc: <v9fs-developer@lists.sourceforge.net>
Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.netSigned-off-by: NDoug Ledford <dledford@redhat.com>

20cf4e02

01 8月, 2019 5 次提交

IB/mad: Fix use-after-free in ib mad completion handling · 770b7d96

由 Jack Morgenstein 提交于 8月 01, 2019

We encountered a use-after-free bug when unloading the driver:

[ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
[ 3562.118385]
[ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
[ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
[ 3562.124383] Call Trace:
[ 3562.125640]  dump_stack+0x9a/0xeb
[ 3562.126911]  print_address_description+0xe3/0x2e0
[ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.129545]  __kasan_report+0x15c/0x1df
[ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.132174]  kasan_report+0xe/0x20
[ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
[ 3562.136158]  ? qlist_free_all+0x51/0xb0
[ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
[ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
[ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
[ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
[ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
[ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
[ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
[ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
[ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
[ 3562.152577]  process_one_work+0x90c/0x1860
[ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
[ 3562.155320]  worker_thread+0x87/0xbb0
[ 3562.156687]  ? __kthread_parkme+0xb6/0x180
[ 3562.158058]  ? process_one_work+0x1860/0x1860
[ 3562.159429]  kthread+0x320/0x3e0
[ 3562.161391]  ? kthread_park+0x120/0x120
[ 3562.162744]  ret_from_fork+0x24/0x30
...
[ 3562.187615] Freed by task 31682:
[ 3562.188602]  save_stack+0x19/0x80
[ 3562.189586]  __kasan_slab_free+0x11d/0x160
[ 3562.190571]  kfree+0xf5/0x2f0
[ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
[ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
[ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
[ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
[ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
[ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
[ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
[ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
[ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
[ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
[ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
[ 3562.202288]  do_syscall_64+0x95/0x470
[ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The problem was that the MAD PD was deallocated before the MAD CQ.
There was completion work pending for the CQ when the PD got deallocated.
When the mad completion handling reached procedure
ib_mad_post_receive_mads(), we got a use-after-free bug in the following
line of code in that procedure:
   sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
(the pd pointer in the above line is no longer valid, because the
pd has been deallocated).

We fix this by allocating the PD before the CQ in procedure
ib_mad_port_open(), and deallocating the PD after freeing the CQ
in procedure ib_mad_port_close().

Since the CQ completion work queue is flushed during ib_free_cq(),
no completions will be pending for that CQ when the PD is later
deallocated.

Note that freeing the CQ before deallocating the PD is the practice
in the ULPs.

Fixes: 4be90bc6 ("IB/mad: Remove ib_get_dma_mr calls")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190801121449.24973-1-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

770b7d96

RDMA/restrack: Track driver QP types in resource tracker · 52e0a118

由 Gal Pressman 提交于 8月 01, 2019

The check for QP type different than XRC has excluded driver QP
types from the resource tracker.
As a result, "rdma resource show" user command would not show opened
driver QPs which does not reflect the real state of the system.

Check QP type explicitly instead of assuming enum values/ordering.

Fixes: 40909f66 ("RDMA/efa: Add EFA verbs implementation")
Signed-off-by: NGal Pressman <galpress@amazon.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190801104354.11417-1-galpress@amazon.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

52e0a118

RDMA/devices: Remove the lock around remove_client_context · 9cd58817

由 Jason Gunthorpe 提交于 7月 31, 2019

Due to the complexity of client->remove() callbacks it is desirable to not
hold any locks while calling them. Remove the last one by tracking only
the highest client ID and running backwards from there over the xarray.

Since the only purpose of that lock was to protect the linked list, we can
drop the lock.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

9cd58817

RDMA/devices: Do not deadlock during client removal · 621e55ff

由 Jason Gunthorpe 提交于 7月 31, 2019

lockdep reports:

   WARNING: possible circular locking dependency detected

   modprobe/302 is trying to acquire lock:
   0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990

   but task is already holding lock:
   000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #2 (&device->client_data_rwsem){++++}:
          down_read+0x3f/0x160
          ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
          cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
          cm_process_work+0x29/0x110 [ib_cm]
          cm_req_handler+0x10f5/0x1c00 [ib_cm]
          cm_work_handler+0x54c/0x311d [ib_cm]
          process_one_work+0x4aa/0xa30
          worker_thread+0x62/0x5b0
          kthread+0x1ca/0x1f0
          ret_from_fork+0x24/0x30

   -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
          process_one_work+0x45f/0xa30
          worker_thread+0x62/0x5b0
          kthread+0x1ca/0x1f0
          ret_from_fork+0x24/0x30

   -> #0 ((wq_completion)ib_cm){+.+.}:
          lock_acquire+0xc8/0x1d0
          flush_workqueue+0x102/0x990
          cm_remove_one+0x30e/0x3c0 [ib_cm]
          remove_client_context+0x94/0xd0 [ib_core]
          disable_device+0x10a/0x1f0 [ib_core]
          __ib_unregister_device+0x5a/0xe0 [ib_core]
          ib_unregister_device+0x21/0x30 [ib_core]
          mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
          __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
          mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
          mlx5_remove_device+0x144/0x150 [mlx5_core]
          mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
          mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
          __x64_sys_delete_module+0x227/0x350
          do_syscall_64+0xc3/0x6a4
          entry_SYSCALL_64_after_hwframe+0x49/0xbe

Which is due to the read side of the client_data_rwsem being obtained
recursively through a work queue flush during cm client removal.

The lock is being held across the remove in remove_client_context() so
that the function is a fence, once it returns the client is removed. This
is required so that the two callers do not proceed with destruction until
the client completes removal.

Instead of using client_data_rwsem use the existing device unregistration
refcount and add a similar client unregistration (client->uses) refcount.

This will fence the two unregistration paths without holding any locks.

Cc: <stable@vger.kernel.org>
Fixes: 921eab11 ("RDMA/devices: Re-organize device.c locking")
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>

621e55ff

IB/core: Add mitigation for Spectre V1 · 61f25982

由 Luck, Tony 提交于 7月 30, 2019

Some processors may mispredict an array bounds check and
speculatively access memory that they should not. With
a user supplied array index we like to play things safe
by masking the value with the array size before it is
used as an index.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20190731043957.GA1600@agluck-desk2.amr.corp.intel.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

61f25982

31 7月, 2019 1 次提交

RDMA/core: fix spelling mistake "Nelink" -> "Netlink" · 1dc55892

由 Colin Ian King 提交于 7月 31, 2019

There is a spelling mistake in a warning message, fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731080144.18327-1-colin.king@canonical.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

1dc55892

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功