提交 · 647c75ac59a48a54dafd4475d14a645a0025a4f4 · openeuler / Kernel

10 8月, 2017 6 次提交

RDMA/netlink: Convert LS to doit callback · 647c75ac

由 Leon Romanovsky 提交于 6月 15, 2017

RDMA_NL_LS protocol is actually does not dump anything,
but sets data and it should be handled by doit callback.

This patch actually converts RDMA_NL_LS to doit callback, while
preserving IWCM and RDMA_CM flows through netlink_dump_start().
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

647c75ac

RDMA/core: Add and expose static device index · ecc82c53

由 Leon Romanovsky 提交于 6月 18, 2017

This patch adds static device index in similar fashion to
already available in netdev world (struct net->ifindex).

In downstream patches, the RDMA nelink will use this idx-to-ib_device
conversion, so as part of this commit, we are exposing the translation
function to be visible for IB/core users.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

ecc82c53

RDMA/core: Add iterator over ib_devices · 8030c835

由 Leon Romanovsky 提交于 6月 19, 2017

The coming nldev needs to iterate over all IB devices in the system
and in order to not expose the ib_devices list outside the devices.c,
it is necessary to provide function iterator.

Current version is written explicitly for nldev callback to avoid
over-engineering at this stage, but it can be easily extended for
other types.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

8030c835

RDMA/netlink: Rename netlink callback struct · 3250b4db

由 Leon Romanovsky 提交于 6月 19, 2017

The RDMA netlink client infrastructure was removed and made obsolete.
The old infrastructure defined struct ibnl_client_cbs. Now that all
uses of this have been updated to the new infrastructure, rename the
struct to be compliant with the current stack naming standards:
struct rdma_nl_cbs.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

3250b4db

RDMA/netlink: Add flag to consolidate common handling · e3a2b93d

由 Leon Romanovsky 提交于 6月 12, 2017

Add ability to provide flags to control RDMA netlink callbacks
and convert addr.c and sa_query.c to be first users of such
infrastructure. It allows to move their CAP_NET_ADMIN checks
into netlink core.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

e3a2b93d

RDMA/netlink: Remove netlink clients infrastructure · c9901724

由 Leon Romanovsky 提交于 6月 05, 2017

RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NChien Tin Tung <chien.tin.tung@intel.com>

c9901724

07 7月, 2017 1 次提交

IB/core: Fix static analysis warning in ib_policy_change_task · a750cfde

由 Daniel Jurgens 提交于 7月 05, 2017

ib_get_cached_subnet_prefix can technically fail, but the only way it
could is not possible based on the loop conditions. Check the return
value before using the variable sp to resolve a static analysis warning.

-v1:
- Fix check to !ret. Paul Moore

Fixes: 8f408ab6 ("selinux lsm IB/core: Implement LSM notification
system")
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

a750cfde

24 5月, 2017 2 次提交

selinux lsm IB/core: Implement LSM notification system · 8f408ab6

由 Daniel Jurgens 提交于 5月 19, 2017

Add a generic notificaiton mechanism in the LSM. Interested consumers
can register a callback with the LSM and security modules can produce
events.

Because access to Infiniband QPs are enforced in the setup phase of a
connection security should be enforced again if the policy changes.
Register infiniband devices for policy change notification and check all
QPs on that device when the notification is received.

Add a call to the notification mechanism from SELinux when the AVC
cache changes or setenforce is cleared.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Acked-by: NJames Morris <james.l.morris@oracle.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

8f408ab6

IB/core: Enforce PKey security on QPs · d291f1a6

由 Daniel Jurgens 提交于 5月 19, 2017

Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.

Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.

When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.

Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.

In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.

These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.

1. When a QP is modified to a particular Port, PKey index or alternate
   path insert that QP into the appropriate lists.

2. Check permission to access the new settings.

3. If step 2 grants access attempt to modify the QP.

4a. If steps 2 and 3 succeed remove any prior associations.

4b. If ether fails remove the new setting associations.

If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.

Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.

If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.

To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

d291f1a6

22 4月, 2017 1 次提交

IB/core: Fix kernel crash during fail to initialize device · 4be3a4fa

由 Parav Pandit 提交于 3月 19, 2017

This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().

This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().

This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.

[81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
[81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
[81416.576222] PGD 78da66067
[81416.576223] PUD 7f2d7c067
[81416.579484] PMD 0
[81416.582720]
[81416.587242] Oops: 0000 [#1] SMP
[81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
[81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
[81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
[81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
[81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
[81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
[81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
[81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
[81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
[81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[81416.820950] Call Trace:
[81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
[81416.829858]  device_release+0x32/0xa0
[81416.834370]  kobject_cleanup+0x63/0x170
[81416.839058]  kobject_put+0x25/0x50
[81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
[81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
[81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
[81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
[81416.866587]  ? 0xffffffffa09e9000
[81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
[81416.876094]  do_one_initcall+0x51/0x1b0
[81416.880861]  ? __vunmap+0x85/0xd0
[81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
[81416.890768]  ? vfree+0x2e/0x70
[81416.894762]  do_init_module+0x60/0x1fa
[81416.899441]  load_module+0x15f6/0x1af0
[81416.904114]  ? __symbol_put+0x60/0x60
[81416.908709]  ? ima_post_read_file+0x3d/0x80
[81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
[81416.920006]  SYSC_finit_module+0xa6/0xf0
[81416.924888]  SyS_finit_module+0xe/0x10
[81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[81416.935089] RIP: 0033:0x7fd879494949
[81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
[81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
[81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
[81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
[81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
[81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
[81417.016045] CR2: 0000000000000000

Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
Fixes: 7738613e ("IB/core: Add per port immutable struct to ib_device")
Cc: <stable@vger.kernel.org> # v4.2+
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4be3a4fa

25 3月, 2017 2 次提交

IB/device: Convert ib-comp-wq to be CPU-bound · b7363e67

由 Sagi Grimberg 提交于 3月 08, 2017

This workqueue is used by our storage target mode ULPs
via the new CQ API. Recent observations when working
with very high-end flash storage devices reveal that
UNBOUND workqueue threads can migrate between cpu cores
and even numa nodes (although some numa locality is accounted
for).

While this attribute can be useful in some workloads,
it does not fit in very nicely with the normal
run-to-completion model we usually use in our target-mode
ULPs and the block-mq irq<->cpu affinity facilities.

The whole block-mq concept is that the completion will
land on the same cpu where the submission was performed.
The fact that our submitter thread is migrating cpus
can break this locality.

We assume that as a target mode ULP, we will serve multiple
initiators/clients and we can spread the load enough without
having to use unbound kworkers.

Also, while we're at it, expose this workqueue via sysfs which
is harmless and can be useful for debug.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b7363e67

IB/core: Restore I/O MMU, s390 and powerpc support · 0957c29f

由 Bart Van Assche 提交于 3月 07, 2017

Avoid that the following error message is reported on the console
while loading an RDMA driver with I/O MMU support enabled:

DMAR: Allocating domain for mlx5_0 failed

Ensure that DMA mapping operations that use to_pci_dev() to
access to struct pci_dev see the correct PCI device. E.g. the s390
and powerpc DMA mapping operations use to_pci_dev() even with I/O
MMU support disabled.

This patch preserves the following changes of the DMA mapping updates
patch series:
- Introduction of dma_virt_ops.
- Removal of ib_device.dma_ops.
- Removal of struct ib_dma_mapping_ops.
- Removal of an if-statement from each ib_dma_*() operation.
- IB HW drivers no longer set dma_device directly.
Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: NParav Pandit <parav@mellanox.com>
Fixes: commit 99db9494 ("IB/core: Remove ib_device.dma_device")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: parav@mellanox.com
Tested-by: parav@mellanox.com
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0957c29f

28 1月, 2017 1 次提交

IB/core: Add inline function to validate port · 24dc831b

由 Yuval Shaia 提交于 1月 25, 2017

Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24dc831b

25 1月, 2017 2 次提交

IB/core: Remove ib_device.dma_device · 99db9494

由 Bart Van Assche 提交于 1月 20, 2017

Add code in ib_register_device() for copying the DMA masks. Use
&ib_device.dev in DMA mapping operations instead of dma_device.
Remove ib_device.dma_device because due to this and previous patches
it is no longer used.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

99db9494

IB/core: Initialize ib_device.dev.parent earlier · 97a9ea84

由 Bart Van Assche 提交于 1月 20, 2017

Move the ib_device.dev.parent initialization code from
ib_device_register_sysfs() to ib_register_device(). Additionally,
allow HBA drivers to set ib_device.dev.parent without setting
ib_device.dma_device. This is the first step towards removing
ib_device.dma_device.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

97a9ea84

11 1月, 2017 1 次提交

IB/core: added support to use rdma cgroup controller · 43579b5f

由 Parav Pandit 提交于 1月 10, 2017

Added support APIs for IB core to register/unregister every IB/RDMA
device with rdma cgroup for tracking rdma resources.
IB core registers with rdma cgroup controller.
Added support APIs for uverbs layer to make use of rdma controller.
Added uverbs layer to perform resource charge/uncharge functionality.
Added support during query_device uverb operation to ensure it
returns resource limits by honoring rdma cgroup configured limits.
Signed-off-by: NParav Pandit <pandit.parav@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

43579b5f

04 12月, 2016 1 次提交

IB/core: Remove debug prints after allocation failure · a0b3455f

由 Leon Romanovsky 提交于 11月 03, 2016

The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a0b3455f

24 6月, 2016 1 次提交

IB/core: Add get FW version string to the core · 5fa76c20

由 Ira Weiny 提交于 6月 15, 2016

Allow for a common core function to get firmware version strings
from the individual devices.

In later patches this format can then then be used to pass a
properly formated version string through the IPoIB layer.

The problem with the current code in the IPoIB layer is that it is
specific to certain hardware types.

Furthermore, this gives us a common function through which the core
can provide a common sysfs entry.  Eventually we may want to
remove the sysfs export but this provides for user space backwards
compatibility.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5fa76c20

07 6月, 2016 2 次提交

IB/core: Fix query port failure in RoCE · d7012467

由 Eli Cohen 提交于 6月 04, 2016

Currently ib_query_port always attempts to to read the subnet prefix by
calling ib_query_gid(). For RoCE/iWARP there is no subnet manager and no
subnet prefix. Fix this by querying GID[0] only for IB networks.

Fixes: fad61ad4 ('IB/core: Add subnet prefix to port info')
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d7012467

IB/core: fix an error code in ib_core_init() · da1f857b

由 Dan Carpenter 提交于 5月 31, 2016

We should return the error code if ib_add_ibnl_clients() fails.  The
current code returns success.

Fixes: 735c631a ('IB/core: Register SA ibnl client during ib_core initialization')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

da1f857b

25 5月, 2016 5 次提交

IB/core: Add IP to GID netlink offload · ae43f828

由 Mark Bloch 提交于 5月 19, 2016

There is an assumption that rdmacm is used only between nodes
in the same IB subnet, this why ARP resolution can be used to turn
IP to GID in rdmacm.

When dealing with IB communication between subnets this assumption
is no longer valid. ARP resolution will get us the next hop device
address and not the peer node's device address.

To solve this issue, we will check user space if it can provide the
GID of the peer node, and fail if not.

We add a sequence number to identify each request and fill in the GID
upon answer from userspace.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ae43f828

IB/core: Register SA ibnl client during ib_core initialization · 735c631a

由 Mark Bloch 提交于 5月 19, 2016

Move SA ibnl client registration to ib_core module init.
This will allow us to register a single client to handle
all RDMA_NL_LS operations and make it SA independent.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

735c631a

IB/SA: Integrate ib_sa module into ib_core module · c2e49c92

由 Mark Bloch 提交于 5月 19, 2016

Consolidate ib_sa into ib_core, this commit eliminates
ib_sa.ko and makes it part of ib_core.ko
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2e49c92

IB/MAD: Integrate ib_mad module into ib_core module · 4c2cb422

由 Mark Bloch 提交于 5月 19, 2016

Consolidate ib_mad into ib_core, this commit eliminates
ib_mad.ko and makes it part of ib_core.ko
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4c2cb422

IB/core: Integrate IB address resolution module into core · e3f20f02

由 Leon Romanovsky 提交于 5月 19, 2016

IB address resolution is declared as a module (ib_addr.ko) which loads
itself before IB core module (ib_core.ko).

It causes to the scenario where IB netlink which is initialized by IB
core can't be used by ib_addr.ko.

In order to solve it, we are converting ib_addr.ko to be part of
IB core module.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e3f20f02

22 3月, 2016 1 次提交

IB/core: Add subnet prefix to port info · fad61ad4

由 Eli Cohen 提交于 3月 11, 2016

The subnet prefix is a part of the port_info MAD returned and should be
available at the ib_port_attr struct. We define it here and provide a
default implementation in case the hardware driver does not provide one.
The subnet prefix is required when creating the address vector to access
the SA in networks where GRH must be used.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fad61ad4

03 3月, 2016 1 次提交

IB/core: trivial prink cleanup. · aba25a3e

由 Parav Pandit 提交于 3月 02, 2016

1. Replaced printk with appropriate pr_warn, pr_err, pr_info.
2. Removed unnecessary prints around memory allocation failure
which are not required, as reported by the checkpatch script.
Signed-off-by: NParav Pandit <pandit.parav@gmail.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

aba25a3e

01 3月, 2016 1 次提交

IB/core: Fix missed clean call in registration path · 5adebafb

由 Leon Romanovsky 提交于 2月 21, 2016

In case of failure returned from query function in
IB device registration, we need to clean IB cache which
was missed.

This change fixes it.

Fixes: 3e153a93 ('IB/core: Save the device attributes on the device
structure')
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5adebafb

23 12月, 2015 3 次提交

IB/core: Add gid_type to gid attribute · b39ffa1d

由 Matan Barak 提交于 12月 23, 2015

In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.

This implies that gid_type should be added to roce_gid_table meta-data.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b39ffa1d

IB/core: Remove ib_query_device · 182a2da0

由 Or Gerlitz 提交于 12月 18, 2015

The copy of the attributes present on the device is now used by all consumers
except for uverbs in case of serving user-space query, where dev->query_device
is called.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

182a2da0

IB/core: Save the device attributes on the device structure · 3e153a93

由 Ira Weiny 提交于 12月 18, 2015

This way both the IB core and upper level drivers can access these cached
device attributes rather than querying or caching them on their own.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3e153a93

12 12月, 2015 1 次提交

IB: add a proper completion queue abstraction · 14d3a3b2

由 Christoph Hellwig 提交于 12月 11, 2015

This adds an abstraction that allows ULPs to simply pass a completion
object and completion callback with each submitted WR and let the RDMA
core handle the nitty gritty details of how to handle completion
interrupts and poll the CQ.

In detail there is a new ib_cqe structure which just contains the
completion callback, and which can be used to get at the containing
object using container_of.  It is pointed to by the WR and WC as an
alternative to the wr_id field, similar to how many ULPs already use
the field to store a pointer using casts.

A driver using the new completion callbacks allocates it's CQs using
the new ib_create_cq API, which in addition to the number of CQEs and
the completion vectors also takes a mode on how we poll for CQEs.
Three modes are available: direct for drivers that never take CQ
interrupts and just poll for them, softirq to poll from softirq context
using the to be renamed blk-iopoll infrastructure which takes care of
rearming and budgeting, or a workqueue for consumer who want to be
called from user context.

Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
the current version of the workqueue code because my two previous
attempts sucked too much and converted the iSER initiator to the new
API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

14d3a3b2

22 10月, 2015 2 次提交

IB/core: Expose and rename ib_find_cached_gid_by_port cache API · d300ec52

由 Matan Barak 提交于 10月 15, 2015

Sometime consumers might want to search for a GID in a specific port.
For example, when a WC arrives and we want to search the GID
that matches that port - it's better to search only the relevant
port.
Exposing and renaming ib_cache_gid_find_by_port in order to match
the naming convention of the module.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d300ec52

IB/core: Add netdev and gid attributes paramteres to cache · 55ee3ab2

由 Matan Barak 提交于 10月 15, 2015

Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55ee3ab2

31 8月, 2015 6 次提交

D
IB/core: Remove needless bracketization · b8071ad8
由 Doug Ledford 提交于 8月 15, 2015
```
Signed-off-by: NDoug Ledford <dledford@redhat.com>
```
b8071ad8

IB/core: missing curly braces in ib_find_gid() · 98d25afa

由 Dan Carpenter 提交于 8月 18, 2015

Smatch says that, based on the indenting, we should probably add curly
braces here.

Fixes: 03db3a2d ('IB/core: Add RoCE GID table management')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

98d25afa

IB/core: Add RoCE GID table management · 03db3a2d

由 Matan Barak 提交于 7月 30, 2015

RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.

Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.

In order to populate the GID table, we listen for events:

(a) netdev up/down/change_addr events - if a netdev is built onto
    our RoCE device, we need to add/delete its IPs. This involves
    adding all GIDs related to this ndev, add default GIDs, etc.

(b) inet events - add new GIDs (according to the IP addresses)
    to the table.

For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.

RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.

RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).

Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.

While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.

In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.

When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).

The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.

The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

03db3a2d

IB/core: Make ib_alloc_device init the kobject · 55aeed06

由 Jason Gunthorpe 提交于 8月 04, 2015

This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55aeed06

IB/core: Find the network device matching connection parameters · 9268f72d

由 Yotam Kenneth 提交于 7月 30, 2015

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB. To resolve the device in these
cases the code will also take the IP address as an additional input.
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYotam Kenneth <yotamke@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9268f72d

IB/core: lock client data with lists_rwsem · 7c1eb45a

由 Haggai Eran 提交于 7月 30, 2015

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7c1eb45a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功