- 15 7月, 2015 1 次提交
-
-
由 Haggai Eran 提交于
The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating an ID. Use mutex_lock_nested() to prevent the warning below. ============================================= [ INFO: possible recursive locking detected ] 4.1.0-rc6-hmm+ #40 Tainted: G O --------------------------------------------- pingpong_rpc_se/10260 is trying to acquire lock: (&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm] but task is already holding lock: (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&file->mut); lock(&file->mut); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by pingpong_rpc_se/10260: #0: (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm] stack backtrace: CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G O 4.1.0-rc6-hmm+ #40 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001 ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000 ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9 Call Trace: [<ffffffff81668f49>] dump_stack+0x4f/0x6e [<ffffffff810bb991>] __lock_acquire+0x741/0x1820 [<ffffffff8121bee9>] ? dput+0x29/0x320 [<ffffffff810bcb38>] lock_acquire+0xc8/0x240 [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm] [<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0 [<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0 [<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm] [<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10 [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm] [<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm] [<ffffffff81200674>] __vfs_write+0x34/0x100 [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110 [<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0 [<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90 [<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0 [<ffffffff812009bb>] vfs_write+0xab/0x120 [<ffffffff81201519>] SyS_write+0x59/0xd0 [<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110 [<ffffffff8166ffee>] system_call_fastpath+0x12/0x76 Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 21 5月, 2015 1 次提交
-
-
由 Ira Weiny 提交于
After discussion upstream, it was agreed to transition the usage of iboe in the kernel to roce. This keeps our terminology consistent with what was finalized in the IBTA Annex 16 and IBTA Annex 17 publications. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 19 5月, 2015 2 次提交
-
-
由 Michael Wang 提交于
Introduce helper rdma_cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Signed-off-by: NMichael Wang <yun.wang@profitbricks.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Tested-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Michael Wang 提交于
Use raw management helpers to reform route related part in IB-core cma. Signed-off-by: NMichael Wang <yun.wang@profitbricks.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Tested-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 18 2月, 2015 1 次提交
-
-
由 Ilya Nelkenbaum 提交于
When marshaling a user path to the kernel struct ib_sa_path, we need to zero smac and dmac and set the vlan id to the "no vlan" value. This is to ensure that Ethernet attributes are not used with InfiniBand QPs. Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures") Signed-off-by: NIlya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 19 1月, 2014 1 次提交
-
-
由 Moni Shoua 提交于
Currently, the IB core and specifically the RDMA-CM assumes that IBoE (RoCE) gids encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change GIDs to be treated as they encode interface IP address. Since Ethernet layer 2 address parameters are not longer encoded within gids, we have to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2 address parameters, namely mac and vlan. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 17 11月, 2013 1 次提交
-
-
由 Joe Perches 提交于
This typedef is unnecessary and should just be removed. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 12 11月, 2013 1 次提交
-
-
由 Sean Hefty 提交于
Problem reported by Avneesh Pant <avneesh.pant@oracle.com>: It looks like we are triggering a bug in RDMA CM/UCM interaction. The bug specifically hits when we have an incoming connection request and the connecting process dies BEFORE the passive end of the connection can process the request i.e. it does not call rdma_get_cm_event() to retrieve the initial connection event. We were able to triage this further and have some additional information now. In the example below when P1 dies after issuing a connect request as the CM id is being destroyed all outstanding connects (to P2) are sent a reject message. We see this reject message being received on the passive end and the appropriate CM ID created for the initial connection message being retrieved in cm_match_req(). The problem is in the ucma_event_handler() code when this reject message is delivered to it and the initial connect message itself HAS NOT been delivered to the client. In fact the client has not even called rdma_cm_get_event() at this stage so we haven't allocated a new ctx in ucma_get_event() and updated the new connection CM_ID to point to the new UCMA context. This results in the reject message not being dropped in ucma_event_handler() for the new connection request as the (if (!ctx->uid)) block is skipped since the ctx it refers to is the listen CM id context which does have a valid UID associated with it (I believe the new CMID for the connection initially uses the listen CMID -> context when it is created in cma_new_conn_id). Thus the assumption that new events for a connection can get dropped in ucma_event_handler() is incorrect IF the initial connect request has not been retrieved in the first case. We end up getting a CM Reject event on the listen CM ID and our upper layer code asserts (in fact this event does not even have the listen_id set as that only gets set up librdmacm for connect requests). The solution is to verify that the cm_id being reported in the event is the same as the cm_id referenced by the ucma context. A mismatch indicates that the ucma context corresponds to the listen. This fix was validated by using a modified version of librdmacm that was able to verify the problem and see that the reject message was indeed dropped after this patch was applied. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 21 6月, 2013 8 次提交
-
-
由 Sean Hefty 提交于
Allow user space applications to join multicast groups using MGIDs directly. MGIDs may be passed using AF_IB addresses. Since the current multicast join command only supports addresses as large as sockaddr_in6, define a new structure for joining addresses specified using sockaddr_ib. Since AF_IB allows the user to specify the qkey when resolving a remote UD QP address, when joining the multicast group use the qkey value, if one has been assigned. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
Allow user space applications to call resolve_addr using AF_IB. To support sockaddr_ib, we need to define a new structure capable of handling the larger address size. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
Support user space binding to addresses using AF_IB. Since sockaddr_ib is larger than sockaddr_in6, we need to define a larger structure when binding using AF_IB. This time we use sockaddr_storage to cover future cases. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
Several commands into the RDMA CM from user space are restricted to supporting addresses which fit into a sockaddr_in6 structure: bind address, resolve address, and join multicast. With the addition of AF_IB, we need to support addresses which are larger than sockaddr_in6. This will be done by adding new commands that exchange address information using sockaddr_storage. However, to support existing applications, we maintain the current commands and structures, but rename them to indicate that they only support IPv4 and v6 addresses. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
Part of address resolution is mapping IP addresses to IB GIDs. With the changes to support querying larger addresses and more path records, also provide a way to query IB GIDs after resolution completes. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
The current query_route call can return up to two path records. The assumption being that one is the primary path, with optional support for an alternate path. In both cases, the paths are assumed to be reversible and are used to send CM MADs. With the ability to manually set IB path data, the rdma cm can eventually be capable of using up to 6 paths per connection: forward primary, reverse primary, forward alternate, reverse alternate, reversible primary path for CM MADs reversible alternate path for CM MADs. (It is unclear at this time if IB routing will complicate this) In order to handle more flexible routing topologies, add a new command to report any number of paths. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
The sockaddr structure for AF_IB is larger than sockaddr_in6. The rdma cm user space ABI uses the latter to exchange address information between user space and the kernel. To support querying for larger addresses, define a new query command that exchanges data using sockaddr_storage, rather than sockaddr_in6. Unlike the existing query_route command, the new command only returns address information. Route (i.e. path record) data is separated. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Sean Hefty 提交于
Allow the user to specify the qkey when using AF_IB. The qkey is added to struct rdma_ucm_conn_param in place of a reserved field, but for backwards compatability, is only accessed if the associated rdma_cm_id is using AF_IB. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 28 2月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
Convert to the much saner new idr interface. v2: Mike triggered WARN_ON() in idr_preload() because send_mad(), which may be used from non-process context, was calling idr_preload() unconditionally. Preload iff @gfp_mask has __GFP_WAIT. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reported-by: N"Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 10月, 2012 1 次提交
-
-
由 Dotan Barak 提交于
Remove unused wait objects from ucm/ucma events flow. Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Acked-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 27 9月, 2012 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 8月, 2012 1 次提交
-
-
由 Tatyana Nikolova 提交于
It is possible for asynchronous RDMA_CM_EVENT_ESTABLISHED events to be generated with ctx->uid == 0, because ucma_set_event_context() copies ctx->uid to the event structure outside of ctx->file->mut. This leads to a crash in the userspace library, since it gets a bogus event. Fix this by taking the mutex a bit earlier in ucma_event_handler. Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: NSean Hefty <Sean.Hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 28 7月, 2012 1 次提交
-
-
由 Roland Dreier 提交于
Suggested by scripts/coccinelle/api/memdup_user.cocci. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 09 7月, 2012 1 次提交
-
-
由 Sean Hefty 提交于
Provide an option for the user to specify that listens should only accept connections where the incoming address family matches that of the locally bound address. This is used to support the equivalent of IPV6_V6ONLY socket option, which allows an app to only accept connection requests directed to IPv6 addresses. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 21 4月, 2012 2 次提交
-
-
由 Eric W. Biederman 提交于
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 3月, 2012 1 次提交
-
-
由 Hefty, Sean 提交于
When we destroy a cm_id, we must purge associated events from the event queue. If the cm_id is for a listen request, we also purge corresponding pending connect requests. This requires destroying the cm_id's associated with the connect requests by calling rdma_destroy_id(). rdma_destroy_id() blocks until all outstanding callbacks have completed. The issue is that we hold file->mut while purging events from the event queue. We also acquire file->mut in our event handler. Calling rdma_destroy_id() while holding file->mut can lead to a deadlock, since the event handler callback cannot acquire file->mut, which prevents rdma_destroy_id() from completing. Fix this by moving events to purge from the event queue to a temporary list. We can then release file->mut and call rdma_destroy_id() outside of holding any locks. Bug report by Or Gerlitz <ogerlitz@mellanox.com>: [ INFO: possible circular locking dependency detected ] 3.3.0-rc5-00008-g79f1e43-dirty #34 Tainted: G I tgtd/9018 is trying to acquire lock: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] but task is already holding lock: (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&file->mut){+.+.+.}: [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0247636>] ucma_event_handler+0x148/0x1dc [rdma_ucm] [<ffffffffa035a79a>] cma_ib_handler+0x1a7/0x1f7 [rdma_cm] [<ffffffffa0333e88>] cm_process_work+0x32/0x119 [ib_cm] [<ffffffffa03362ab>] cm_work_handler+0xfb8/0xfe5 [ib_cm] [<ffffffff810423e2>] process_one_work+0x2bd/0x4a6 [<ffffffff810429e2>] worker_thread+0x1d6/0x350 [<ffffffff810462a6>] kthread+0x84/0x8c [<ffffffff81369624>] kernel_thread_helper+0x4/0x10 -> #0 (&id_priv->handler_mutex){+.+.+.}: [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752 [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm] [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm] [<ffffffff810df6ef>] fput+0x117/0x1cf [<ffffffff810dc76e>] filp_close+0x6d/0x78 [<ffffffff8102b667>] put_files_struct+0xbd/0x17d [<ffffffff8102b76d>] exit_files+0x46/0x4e [<ffffffff8102d057>] do_exit+0x299/0x75d [<ffffffff8102d599>] do_group_exit+0x7e/0xa9 [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555 [<ffffffff81001717>] do_signal+0x39/0x634 [<ffffffff81001d39>] do_notify_resume+0x27/0x69 [<ffffffff81361c03>] retint_signal+0x46/0x83 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&file->mut); lock(&id_priv->handler_mutex); lock(&file->mut); lock(&id_priv->handler_mutex); *** DEADLOCK *** 1 lock held by tgtd/9018: #0: (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm] stack backtrace: Pid: 9018, comm: tgtd Tainted: G I 3.3.0-rc5-00008-g79f1e43-dirty #34 Call Trace: [<ffffffff81029e9c>] ? console_unlock+0x18e/0x207 [<ffffffff81066433>] print_circular_bug+0x28e/0x29f [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752 [<ffffffff810682f3>] lock_acquire+0xf0/0x116 [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6 [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm] [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm] [<ffffffff810df6ef>] fput+0x117/0x1cf [<ffffffff810dc76e>] filp_close+0x6d/0x78 [<ffffffff8102b667>] put_files_struct+0xbd/0x17d [<ffffffff8102b5cc>] ? put_files_struct+0x22/0x17d [<ffffffff8102b76d>] exit_files+0x46/0x4e [<ffffffff8102d057>] do_exit+0x299/0x75d [<ffffffff8102d599>] do_group_exit+0x7e/0xa9 [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81001717>] do_signal+0x39/0x634 [<ffffffff8135e037>] ? printk+0x3c/0x45 [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81361803>] ? _raw_spin_unlock_irq+0x2b/0x40 [<ffffffff81039011>] ? set_current_blocked+0x44/0x49 [<ffffffff81361bce>] ? retint_signal+0x11/0x83 [<ffffffff81001d39>] do_notify_resume+0x27/0x69 [<ffffffff8118a1fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81361c03>] retint_signal+0x46/0x83 Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 28 1月, 2012 1 次提交
-
-
由 Sean Hefty 提交于
After reporting a new connection request to user space, the rdma_ucm will discard subsequent events until the user has associated a user space idenfier with the kernel cm_id. This is needed to avoid reporting a reject/disconnect event to the user for a request that they may not have processed. The user space identifier is set once the user tries to accept the connection request. However, the following race exists in ucma_accept(): ctx->uid = cmd.uid; <events may be reported now> ret = rdma_accept(ctx->cm_id, ...); Once ctx->uid has been set, new events may be reported to the user. While the above mentioned race is avoided, there is an issue that the user _may_ receive a reject/disconnect event if rdma_accept() fails, depending on when the event is processed. To simplify the use of rdma_accept(), discard all events unless rdma_accept() succeeds. This problem was discovered based on questions from Roland Dreier <roland@purestorage.com>. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 01 11月, 2011 1 次提交
-
-
由 Paul Gortmaker 提交于
They had been getting it implicitly via device.h but we can't rely on that for the future, due to a pending cleanup so fix it now. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 14 10月, 2011 1 次提交
-
-
由 Sean Hefty 提交于
Allow the user to indicate the QP type separately from the port space when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a 1:1 relationship between the QP type and port space, so we need to switch on the QP type to select between UD and connected QPs. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 07 10月, 2011 1 次提交
-
-
由 Hefty, Sean 提交于
cmd is unsigned, no need to check for < 0. Found by code inspection. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 26 5月, 2011 1 次提交
-
-
由 Sean Hefty 提交于
The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 24 5月, 2011 1 次提交
-
-
由 Roland Dreier 提交于
We want udev to create a device node under /dev/infiniband with permission 0666 for rdma_cm, so add that info to our struct miscdevice. Signed-off-by: NRoland Dreier <roland@purestorage.com> Acked-by: NSean Hefty <sean.hefty@intel.com>
-
- 10 5月, 2011 1 次提交
-
-
由 Hefty, Sean 提交于
Lustre requires that clients bind to a privileged port number before connecting to a remote server. On larger clusters (typically more than about 1000 nodes), the number of privileged ports is exhausted, resulting in lustre being unusable. To handle this, we add support for reusable addresses to the rdma_cm. This mimics the behavior of the socket option SO_REUSEADDR. A user may set an rdma_cm_id to reuse an address before calling rdma_bind_addr() (explicitly or implicitly). If set, other rdma_cm_id's may be bound to the same address, provided that they all have reuse enabled, and there are no active listens. If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it will only succeed if there are no other id's bound to that same address. The reuse option is exported to user space. The behavior of the kernel reuse implementation was verified against that given by sockets. This patch is derived from a path by Ira Weiny <weiny2@llnl.gov> Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 29 1月, 2011 1 次提交
-
-
由 Steve Wise 提交于
For iWARP rdma_cm ids, the "route" information is the L2 src and next hop addresses. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 26 10月, 2010 1 次提交
-
-
由 Eli Cohen 提交于
Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN. The 3 bits user priority field of the packets are identical to the 3 bits of the SL. In case of rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 24 10月, 2010 1 次提交
-
-
由 Steve Wise 提交于
For iWARP connections, the connect request is carried in a TCP payload on an already established TCP connection. So if the ucma's backlog is full, the connection request is transmitted and acked at the TCP level by the time the connect request gets dropped in the ucma. The end result is the connection gets rejected by the iWARP provider. Further, a 32 node 256NP OpenMPI job will generate > 128 connect requests on some ranks. This patch increases the default max backlog to 1024, and adds a sysctl variable so the backlog can be adjusted at run time. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 14 10月, 2010 1 次提交
-
-
由 Eli Cohen 提交于
Add support for IBoE device binding and IP --> GID resolution. Path resolving and multicast joining are implemented within cma.c by filling in the responses and running callbacks in the CMA work queue. IP --> GID resolution always yields IPv6 link local addresses; remote GIDs are derived from the destination MAC address of the remote port. Multicast GIDs are always mapped to multicast MACs as is done in IPv6. (IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in <http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.) Some helper functions are added to ib_addr.h. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 22 4月, 2010 1 次提交
-
-
由 Roland Dreier 提交于
Several RDMA user-access drivers have file_operations structures with no .llseek method set. None of the drivers actually do anything with f_pos, so this means llseek is essentially a NOP, instead of returning an error as leaving other file_operations methods unimplemented would do. This is mostly harmless, except that a NULL .llseek means that default_llseek() is used, and this function grabs the BKL, which we would like to avoid. Since llseek does nothing useful on these files, we would like it to return an error to userspace instead of silently grabbing the BKL and succeeding. For nearly all of the file types, we take the belt-and-suspenders approach of setting the .llseek method to no_llseek and also calling nonseekable_open(); the exception is the uverbs_event files, which are created with anon_inode_getfile(), which already sets f_mode the same way as nonseekable_open() would. This work is motivated by Arnd Bergmann's bkl-removal tree. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
- 20 11月, 2009 1 次提交
-
-
由 Sean Hefty 提交于
The RDMA CM is intended to support the use of a loopback address when establishing a connection; however, the behavior of the CM when loopback addresses are used is confusing and does not always work, depending on whether loopback was specified by the server, the client, or both. The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdam_bind_addr, no device is associated with the rdma_cm_id. Fix this. If a loopback address is specified by the client as the destination address for a connection, it will fail to establish a connection. This is true even if the server is listing across all addresses or on the loopback address itself. The issue is that the server tries to translate the IP address carried in the REQ message to a local net_device address, which fails. The translation is not needed in this case, since the REQ carries the actual HW address that should be used. Finally, cleanup loopback support to be more transport neutral. Replace separate calls to get/set the sgid and dgid from the device address to a single call that behaves correctly depending on the format of the device address. And support both IPv4 and IPv6 address formats. Signed-off-by: NSean Hefty <sean.hefty@intel.com> [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/ - Roland ] Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-