提交 · 611cb92b082ad16b2fe1258e51d5aca7de540dfb · openeuler / Kernel

28 3月, 2018 1 次提交

RDMA/ucma: Fix uABI structure layouts for 32/64 compat · 611cb92b

由 Jason Gunthorpe 提交于 3月 20, 2018

The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.

The kernel requires it to be the expected length or longer so 32 bit
builds running on a 64 bit kernel will not work.

Retain full compat by having all kernels accept a struct with or without
the trailing reserved field.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

611cb92b

15 3月, 2018 1 次提交

RDMA/ucma: Don't allow join attempts for unsupported AF family · 0c81ffc6

由 Leon Romanovsky 提交于 3月 13, 2018

Users can provide garbage while calling to ucma_join_ip_multicast(),
it will indirectly cause to rdma_addr_size() return 0, making the
call to ucma_process_join(), which had the right checks, but it is
better to check the input as early as possible.

The following crash from syzkaller revealed it.

kernel BUG at lib/string.c:1052!
invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286
RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000
RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12
RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998
R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00
FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 memcpy include/linux/string.h:344 [inline]
 ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421
 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633
 __vfs_write+0xef/0x970 fs/read_write.c:480
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
 do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f9ec99
RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100
RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de
55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90
90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0

Fixes: 5bc2b7b3 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast")
Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0c81ffc6

09 3月, 2018 1 次提交

RDMA/nldev: provide detailed CM_ID information · 00313983

由 Steve Wise 提交于 3月 01, 2018

Implement RDMA nldev netlink interface to get detailed CM_ID information.

Because cm_id's are attached to rdma devices in various work queue
contexts, the pid and task information at restrak_add() time is sometimes
not useful. For example, an nvme/f host connection cm_id ends up being
bound to a device in a work queue context and the resulting pid at attach
time no longer exists after connection setup. So instead we mark all
cm_id's created via the rdma_ucm as "user", and all others as "kernel".
This required tweaking the restrack code a little. It also required
wrapping some rdma_cm functions to allow passing the module name string.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

00313983

08 3月, 2018 2 次提交

RDMA/ucma: Check that user doesn't overflow QP state · a5880b84

由 Leon Romanovsky 提交于 3月 07, 2018

The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.

Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 75216638 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a5880b84

RDMA/ucma: Limit possible option size · 6a21dfc0

由 Leon Romanovsky 提交于 3月 07, 2018

Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.

This patch takes simplest possible approach and prevents providing
values more than possible to allocate.

Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409 ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6a21dfc0

12 2月, 2018 1 次提交

vfs: do bulk POLL* -> EPOLL* replacement · a9a08845

由 Linus Torvalds 提交于 2月 11, 2018

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9a08845

20 1月, 2018 1 次提交

RDMA/ucma: Use rdma cm API to query GID · 7a2f64ee

由 Parav Pandit 提交于 1月 18, 2018

Make use of rdma_read_gids() API to read SGID and DGID which returns
correct GIDs for RoCE and other transports.

rdma_addr_get_dgid() for RoCE for client side connections returns MAC
address, instead of DGID.
rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
when more than one IP address is assigned to the netdevice.

Therefore use transport agnostic rdma_read_gids() API provided by rdma_cm
module.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7a2f64ee

11 1月, 2018 2 次提交

RDMA/cma: Fix rdma_cm path querying for RoCE · 89838118

由 Parav Pandit 提交于 1月 08, 2018

The 'if' logic in ucma_query_path was broken with OPA was introduced
and started to treat RoCE paths as as OPA paths. Invert the logic
of the 'if' so only OPA paths are treated as OPA paths.

Otherwise the path records returned to rdma_cma users are mangled
when in RoCE mode.

Fixes: 57520751 ("IB/SA: Add OPA path record type")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

89838118

RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths · fe75889f

由 Parav Pandit 提交于 1月 08, 2018

Since 2006 there has been no user of rdmacm based application to make use
of setting multiple path records using rdma_set_ib_paths API.

Therefore code is simplified to allow setting one path record entry.
Now that it sets only single path, it is renamed to reflect the same.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fe75889f

29 11月, 2017 1 次提交
- A
  the rest of drivers/*: annotate ->poll() instances · afc9a42b
  由 Al Viro 提交于 7月 03, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  afc9a42b
09 8月, 2017 1 次提交

IB/core: Convert ah_attr from OPA to IB when copying to user · d541e455

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

OPA address handle atttibutes that have 32 bit LIDs would have to
be converted to IB address handle attribute with the LID field
programmed in the GID before copying to user space.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d541e455

02 5月, 2017 2 次提交

IB/SA: Add OPA path record type · 57520751

由 Dasaratharaman Chandramouli 提交于 4月 27, 2017

Add opa_sa_path_rec to sa_path_rec data structure.
The 'type' field in sa_path_rec identifies the
type of the path record.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

57520751

IB/SA: Rename ib_sa_path_rec to sa_path_rec · c2f8fc4e

由 Dasaratharaman Chandramouli 提交于 4月 27, 2017

Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2f8fc4e

04 12月, 2016 1 次提交

infiniband: remove WARN that is not kernel bug · f73a1dbc

由 Leon Romanovsky 提交于 11月 21, 2016

On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote:
> > >
> > > In ib_ucm_write function there is a wrong prefix:
> > >
> > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",
> >
> > I did it intentionally to have the same errors for all flows.
>
> Lets actually use a good message too please?
>
>  pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n",
>
> Jason

>From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Mon, 21 Nov 2016 13:30:59 +0200
Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug

WARNINGs mean kernel bugs, in this case, they are placed
to mark programming errors and/or malicious attempts.

BUG/WARNs that are not kernel bugs hinder automated testing efforts.
Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f73a1dbc

08 10月, 2016 1 次提交

IB/ucma: Remove deprecated create_singlethread_workqueue · a190d3b0

由 Bhaktipriya Shridhar 提交于 8月 15, 2016

alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.

The workqueue "close_wq" queues work items &ctx->close_work (maps to
ucma_close_id) and &con_req_eve->close_work (maps to
ucma_close_event_id). It has been identity converted.

WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a190d3b0

04 8月, 2016 1 次提交

IB/core: Support for CMA multicast join flags · ab15c95a

由 Alex Vesker 提交于 7月 06, 2016

Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:

1. Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   and receive messages from the MCG.

2. Send Only Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   but doesn't receive any messages from the MCG.

   IB: Send Only Full Member requires a query of ClassPortInfo
       to determine if SM/SA supports this option. If SM/SA
       doesn't support Send-Only there will be no join request
       sent and an error will be returned.

   ETH: When Send Only Full Member is requested no IGMP join
	will be sent.
Signed-off-by: NAlex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ab15c95a

29 4月, 2016 1 次提交

IB/security: Restrict use of the write() interface · e6bd18f5

由 Jason Gunthorpe 提交于 4月 10, 2016

The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: NJann Horn <jann@thejh.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e6bd18f5

03 3月, 2016 1 次提交

IB/core: trivial prink cleanup. · aba25a3e

由 Parav Pandit 提交于 3月 02, 2016

1. Replaced printk with appropriate pr_warn, pr_err, pr_info.
2. Removed unnecessary prints around memory allocation failure
which are not required, as reported by the checkpatch script.
Signed-off-by: NParav Pandit <pandit.parav@gmail.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aba25a3e

29 10月, 2015 2 次提交

IB/ucma: Take the network namespace from the process · 95893dde

由 Guy Shapiro 提交于 10月 22, 2015

Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYotam Kenneth <yotamke@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

95893dde

IB/cma: Add support for network namespaces · fa20105e

由 Guy Shapiro 提交于 10月 22, 2015

Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is
   used to populate the network namespace field in rdma_id_private.
   rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
   of ib_cma, when listening on an ID and when looking for an ID for an
   incoming request.
3. Decrementing the reference count for the appropriate network namespace
   when calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling
from other modules.
Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYotam Kenneth <yotamke@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fa20105e

22 10月, 2015 1 次提交

IB/core: Remove smac and vlan id from qp_attr and ah_attr · aa744cc0

由 Matan Barak 提交于 10月 15, 2015

Smac and vlan id could be resolved from the GID attribute, and thus
these attributes aren't needed anymore. Removing them.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aa744cc0

21 10月, 2015 1 次提交

IB/ucma: check workqueue allocation before usage · 0174b381

由 Sasha Levin 提交于 9月 17, 2015

Allocating a workqueue might fail, which wasn't checked so far and would
lead to NULL ptr derefs when an attempt to use it was made.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0174b381

31 8月, 2015 1 次提交

IB/ucma: HW Device hot-removal support · e1c30298

由 Yishai Hadas 提交于 8月 13, 2015

Currently, IB/cma remove_one flow blocks until all user descriptor managed by
IB/ucma are released. This prevents hot-removal of IB devices. This patch
allows IB/cma to remove devices regardless of user space activity. Upon getting
the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources
for the given ucontext. The ucontext itself is still alive till its explicit
destroying by its creator.

Running applications at that time will have some zombie device, further
operations may fail.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e1c30298

29 8月, 2015 1 次提交

IB/ucma: Fix theoretical user triggered use-after-free · 7e967fd0

由 Jason Gunthorpe 提交于 8月 04, 2015

Something like this:

CPU A                         CPU B
Acked-by: NSean Hefty <sean.hefty@intel.com>

========================      ================================
ucma_destroy_id()
 wait_for_completion()
                              .. anything
                                ucma_put_ctx()
                                  complete()
 .. continues ...
                              ucma_leave_multicast()
                               mutex_lock(mut)
                                 atomic_inc(ctx->ref)
                               mutex_unlock(mut)
 ucma_free_ctx()
  ucma_cleanup_multicast()
   mutex_lock(mut)
     kfree(mc)
                               rdma_leave_multicast(mc->ctx->cm_id,..

Fix it by latching the ref at 0. Once it goes to 0 mc and ctx cannot
leave the mutex(mut) protection.

The other atomic_inc in ucma_get_ctx is OK because mutex(mut) protects
it from racing with ucma_destroy_id.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7e967fd0

15 7月, 2015 2 次提交

IB/core: Destroy multcast_idr on module exit · 45d25420

由 Johannes Thumshirn 提交于 7月 08, 2015

Destroy multcast_idr on module exit, reclaiming the allocated memory.

This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@

module_init(init);

@ defines_module_exit @
identifier exit;
@@

module_exit(exit);

@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@

DEFINE_IDR(idr);

@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 idr_destroy(&idr);
 ...
}

@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@

exit(void)
{
 ...
 +idr_destroy(&idr);
}

</SmPL>
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

45d25420

IB/ucma: Fix lockdep warning in ucma_lock_files · 31b57b87

由 Haggai Eran 提交于 7月 07, 2015

The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
an ID. Use mutex_lock_nested() to prevent the warning below.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.1.0-rc6-hmm+ #40 Tainted: G           O
 ---------------------------------------------
 pingpong_rpc_se/10260 is trying to acquire lock:
  (&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]

 but task is already holding lock:
  (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&file->mut);
   lock(&file->mut);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by pingpong_rpc_se/10260:
  #0:  (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]

 stack backtrace:
 CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G           O    4.1.0-rc6-hmm+ #40
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
  ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001
  ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000
  ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9
 Call Trace:
  [<ffffffff81668f49>] dump_stack+0x4f/0x6e
  [<ffffffff810bb991>] __lock_acquire+0x741/0x1820
  [<ffffffff8121bee9>] ? dput+0x29/0x320
  [<ffffffff810bcb38>] lock_acquire+0xc8/0x240
  [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0
  [<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0
  [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10
  [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
  [<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm]
  [<ffffffff81200674>] __vfs_write+0x34/0x100
  [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
  [<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0
  [<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90
  [<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0
  [<ffffffff812009bb>] vfs_write+0xab/0x120
  [<ffffffff81201519>] SyS_write+0x59/0xd0
  [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
  [<ffffffff8166ffee>] system_call_fastpath+0x12/0x76
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

31b57b87

21 5月, 2015 1 次提交

IB/core: Change rdma_protocol_iboe to roce · 5d9fb044

由 Ira Weiny 提交于 5月 14, 2015

After discussion upstream, it was agreed to transition the usage of iboe
in the kernel to roce. This keeps our terminology consistent with what
was finalized in the IBTA Annex 16 and IBTA Annex 17 publications.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5d9fb044

19 5月, 2015 2 次提交

IB/Verbs: Use management helper rdma_cap_ib_sa() · fe53ba2f

由 Michael Wang 提交于 5月 05, 2015

Introduce helper rdma_cap_ib_sa() to help us check if the port of an
IB device support Infiniband Subnet Administration.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fe53ba2f

IB/Verbs: Reform route related part in IB-core cma · c72f2189

由 Michael Wang 提交于 5月 05, 2015

Use raw management helpers to reform route related part in IB-core cma.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c72f2189

18 2月, 2015 1 次提交

IB/core: When marshaling ucma path from user-space, clear unused fields · c2be9dc0

由 Ilya Nelkenbaum 提交于 2月 05, 2015

When marshaling a user path to the kernel struct ib_sa_path, we need
to zero smac and dmac and set the vlan id to the "no vlan" value.

This is to ensure that Ethernet attributes are not used with
InfiniBand QPs.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: NIlya Nelkenbaum <ilyan@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c2be9dc0

19 1月, 2014 1 次提交

IB/cma: IBoE (RoCE) IP-based GID addressing · 7b85627b

由 Moni Shoua 提交于 12月 12, 2013

Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7b85627b

17 11月, 2013 1 次提交

IB/ucma: Convert use of typedef ctl_table to struct ctl_table · f3a5e3e3

由 Joe Perches 提交于 10月 22, 2013

This typedef is unnecessary and should just be removed.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f3a5e3e3

12 11月, 2013 1 次提交

RDMA/ucma: Discard events for IDs not yet claimed by user space · c6b21824

由 Sean Hefty 提交于 11月 01, 2013

Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:

It looks like we are triggering a bug in RDMA CM/UCM interaction.
The bug specifically hits when we have an incoming connection
request and the connecting process dies BEFORE the passive end of
the connection can process the request i.e. it does not call
rdma_get_cm_event() to retrieve the initial connection event. We
were able to triage this further and have some additional
information now.

In the example below when P1 dies after issuing a connect request
as the CM id is being destroyed all outstanding connects (to P2)
are sent a reject message. We see this reject message being
received on the passive end and the appropriate CM ID created for
the initial connection message being retrieved in cm_match_req().
The problem is in the ucma_event_handler() code when this reject
message is delivered to it and the initial connect message itself
HAS NOT been delivered to the client. In fact the client has not
even called rdma_cm_get_event() at this stage so we haven't
allocated a new ctx in ucma_get_event() and updated the new
connection CM_ID to point to the new UCMA context.

This results in the reject message not being dropped in
ucma_event_handler() for the new connection request as the
(if (!ctx->uid)) block is skipped since the ctx it refers to is
the listen CM id context which does have a valid UID associated
with it (I believe the new CMID for the connection initially
uses the listen CMID -> context when it is created in
cma_new_conn_id). Thus the assumption that new events for a
connection can get dropped in ucma_event_handler() is incorrect
IF the initial connect request has not been retrieved in the
first case. We end up getting a CM Reject event on the listen CM
ID and our upper layer code asserts (in fact this event does not
even have the listen_id set as that only gets set up librdmacm
for connect requests).

The solution is to verify that the cm_id being reported in the event
is the same as the cm_id referenced by the ucma context. A mismatch
indicates that the ucma context corresponds to the listen. This fix
was validated by using a modified version of librdmacm that was able
to verify the problem and see that the reject message was indeed
dropped after this patch was applied.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c6b21824

21 6月, 2013 7 次提交

RDMA/ucma: Allow user space to specify AF_IB when joining multicast · 5bc2b7b3

由 Sean Hefty 提交于 5月 29, 2013

Allow user space applications to join multicast groups using MGIDs
directly.  MGIDs may be passed using AF_IB addresses.  Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5bc2b7b3

RDMA/ucma: Allow user space to pass AF_IB into resolve · 209cf2a7

由 Sean Hefty 提交于 5月 29, 2013

Allow user space applications to call resolve_addr using AF_IB.  To
support sockaddr_ib, we need to define a new structure capable of
handling the larger address size.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

209cf2a7

RDMA/ucma: Allow user space to bind to AF_IB · eebe4c3a

由 Sean Hefty 提交于 5月 29, 2013

Support user space binding to addresses using AF_IB.  Since
sockaddr_ib is larger than sockaddr_in6, we need to define a larger
structure when binding using AF_IB.  This time we use sockaddr_storage
to cover future cases.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

eebe4c3a

RDMA/ucma: Name changes to indicate only IP addresses supported · 05ad9457

由 Sean Hefty 提交于 5月 29, 2013

Several commands into the RDMA CM from user space are restricted to
supporting addresses which fit into a sockaddr_in6 structure: bind
address, resolve address, and join multicast.

With the addition of AF_IB, we need to support addresses which are
larger than sockaddr_in6.  This will be done by adding new commands
that exchange address information using sockaddr_storage.  However, to
support existing applications, we maintain the current commands and
structures, but rename them to indicate that they only support IPv4
and v6 addresses.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

05ad9457

RDMA/ucma: Add ability to query GID addresses · edaa7a55

由 Sean Hefty 提交于 5月 29, 2013

Part of address resolution is mapping IP addresses to IB GIDs. With
the changes to support querying larger addresses and more path records,
also provide a way to query IB GIDs after resolution completes.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

edaa7a55

RDMA/ucma: Support querying when IB paths are not reversible · ac53b264

由 Sean Hefty 提交于 5月 29, 2013

The current query_route call can return up to two path records.  The
assumption being that one is the primary path, with optional support
for an alternate path.  In both cases, the paths are assumed to be
reversible and are used to send CM MADs.

With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:

	forward primary, reverse primary,
	forward alternate, reverse alternate,
	reversible primary path for CM MADs
	reversible alternate path for CM MADs.

(It is unclear at this time if IB routing will complicate this)  In
order to handle more flexible routing topologies, add a new command to
report any number of paths.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ac53b264

RDMA/ucma: Support querying for AF_IB addresses · ee7aed45

由 Sean Hefty 提交于 5月 29, 2013

The sockaddr structure for AF_IB is larger than sockaddr_in6.  The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.

To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information.  Route (i.e. path record) data is separated.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ee7aed45

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功