- 25 5月, 2016 6 次提交
-
-
由 Mark Bloch 提交于
There is an assumption that rdmacm is used only between nodes in the same IB subnet, this why ARP resolution can be used to turn IP to GID in rdmacm. When dealing with IB communication between subnets this assumption is no longer valid. ARP resolution will get us the next hop device address and not the peer node's device address. To solve this issue, we will check user space if it can provide the GID of the peer node, and fail if not. We add a sequence number to identify each request and fill in the GID upon answer from userspace. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Move SA ibnl client registration to ib_core module init. This will allow us to register a single client to handle all RDMA_NL_LS operations and make it SA independent. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
This commits adds a new RDMA local service operation: - IP to GID resolution. The client request would include the ifindex of the outgoing interface and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6) the destnation IP. The local service would answer with a message that has the attribute: - LS_NLA_TYPE_DGID - The destination GID. Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 18 5月, 2016 4 次提交
-
-
由 Matan Barak 提交于
Previously, mlx5_ib_cq_comp was executed from interrupt context. Under heavy load, this could cause the CPU core to be in an interrupt context too long. Instead of executing the handler from the interrupt context we execute it from a much friendly tasklet context. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx5 Ethernet napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christoph Lameter 提交于
In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program to listen to all incoming packets on a specific interface, and the higher CAP_NET_ADMIN is required to set the interface into promiscuous mode. We want to emulate that same basic division of privilege in the RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with CAP_NET_RAW to listen to all incoming flows (and direct them as they see fit in their own listen stream). Do not require CAP_NET_ADMIN just to listen to traffic already incoming. Reserve CAP_NET_ADMIN if we attempt to set promiscuous mode. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 shamir rabinovitch 提交于
The problem is that the function 'send_reply_to_slave' gets the 'req_sa_mad' as a pointer whose address is only aliged to 4 bytes but is 8 bytes in size. This can result in unaligned access faults on certain architectures. Sowmini Varadhan pointed to this reply from Dave Miller that say that memcpy should not be used to solve alignment issues: https://lkml.org/lkml/2015/10/21/352 Optimization of memcpy to 'ldx' instruction can only happen if the compiler knows that the size of the data we are copying is 8 bytes and it assumes it is aligned to 8 bytes. If the compiler know the type is not aligned to 8 it must not optimize the 8 byte copy. Defining the data type as aligned to 4 forces the compiler to treat all accesses as though they aren't aligned and avoids the 'ldx' optimization. Full credit for the idea goes to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>. Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 14 5月, 2016 30 次提交
-
-
由 Doug Ledford 提交于
-
由 Majd Dibbiny 提交于
Report Scatter FCS support when the Firmware supports as well. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Majd Dibbiny 提交于
Enable Scatter FCS in the RQ context when the user passes Scatter FCS create flag. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Majd Dibbiny 提交于
Raw Packet QPs that were created with Scatter FCS flag, will scatter the FCS into the receive buffers. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Majd Dibbiny 提交于
Raw Scatter FCS device capability is set when the NIC supports scattering the FCS to the receive buffers of Raw Packet QPs. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Majd Dibbiny 提交于
Since all the uverbs device_cap_flags are occupied, we need a place to expose more device capabilities. This patch adds a new 64 bit device_cap_flags_ex to expose new device capabilities. The lower 32 bits will be identical to the original device_cap_flags, The upper 32 bits will be new capabilities. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Colin Ian King 提交于
passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NChien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Lars-Peter Clausen 提交于
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: NLars-Peter Clausen <lars@metafoo.de> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Julia Lawall 提交于
The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Guy Levi 提交于
By this patch, the user space library will be able to improve performance using appropriate ringing DoorBell method according to the memory type it asked for. Currently only one mapping command is allowed for UARs: MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the UARs to write-combining (WC) if the system supports it. If the system is not supporting WC the UARs are mapped to non-cached(NC). In this case the user space library can't tell which mapping is applied. This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as requested and fails if it can't. Since there is no generic way to check if the requested memory region can be mapped as WC, driver enables conclusive WC mapping only for x86, PowerPC and ARM which support WC for the device's memory region. Signed-off-by: NGuy Levy <guyle@mellanox.com> Signed-off-by: NMoshe Lazer <moshel@mellanox.com> Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
The current mlx5 code disallows mapping the free running counter of mlx5 based hardwares when PROT_EXEC is set. Although this behaviour is correct, Linux does add an implicit VM_EXEC to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process personality. This happens for example if the process stack is executable. This causes libmlx5 to output a warning and prevents the user from reading the free running clock. Executing the init segment of the hardware isn't a security risk (at least no more than executing a process own stack), so we just prevent writes to there. Fixes: d69e3bcf ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: NMatan Barak <matanb@mellanox.com> Reviewed-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Geliang Tang 提交于
Simplify the code in search_relocate_mgid0_group with by using list_for_each_entry_safe(). Signed-off-by: NGeliang Tang <geliangtang@163.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b9 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
Fix array overrun when going over callback table. In declaration of callback table, the max size isn't provided and in registration phase, it is provided. There is potential scenario where a new operation is added and it is not supported by current client. The acceptance of such operation by ib_netlink will cause to array overrun. Fixes: 809d5fc9 ("infiniband: pass rdma_cm module to netlink_dump_start") Fixes: b493d91d ("iwcm: common code for port mapper") Fixes: 2ca546b9 ("IB/sa: Route SA pathrecord query through netlink") Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1)) which means op (defined as an int) can never be a negative number. Fixes: b2cbae2c ('RDMA: Add netlink infrastructure') Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mark Bloch 提交于
In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: NMark Bloch <markb@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Andy Shevchenko 提交于
There is no need to duplicate a lot of code that is in the kernel library for ages. Replace duplicating code by calling to print_hex_dump() directly. Note that output is slightly changed: - hex and ascii parts have just two spaces delimeter - there is no delimeter for ascii portions - file and line removed from prefix (they were redundant anyway since previous output shows same closer enough) Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Colin Ian King 提交于
fix spelling mistake, argumant -> argument Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Denys Vlasenko 提交于
This function compiles to 550 bytes of machine code. Three callsites, all in nes_create_qp. Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> CC: Faisal Latif <faisal.latif@intel.com> CC: Doug Ledford <dledford@redhat.com> CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-By: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Tatyana Nikolova 提交于
Adding sq and rq drain functions, which block until all previously posted wr-s in the specified queue have completed. A completion object is signaled to unblock the thread, when the last cqe for the corresponding queue is processed. Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Avoid that sparse complains about the comparison of s_addr with INADDR_ANY. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Faisal Latif <faisal.latif@intel.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Avoid that the following BUG() is triggered against a debug kernel: kernel BUG at include/linux/scatterlist.h:92! RIP: 0010:[<ffffffffa0467199>] [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp] Call Trace: [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp] [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp] [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40 [<ffffffff81299070>] blk_delay_work+0x20/0x30 [<ffffffff81071f07>] process_one_work+0x197/0x480 [<ffffffff81072239>] worker_thread+0x49/0x490 [<ffffffff810787ea>] kthread+0xea/0x100 [<ffffffff8159b632>] ret_from_fork+0x22/0x40 Fixes: f7f7aab1 ("IB/srp: Convert to new registration API") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Hans Westgaard Ry 提交于
IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
The last user of pkey_mutex was removed in db84f880 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Colin Ian King 提交于
passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NChien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Lars-Peter Clausen 提交于
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: NLars-Peter Clausen <lars@metafoo.de> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Julia Lawall 提交于
The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
__force casts should be avoided if there is a better alternative. Hence modify the comparison of s_addr with INADDR_ANY such that the __force cast is no longer necessary. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Vipul Pandya <vipul@chelsio.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Hariprasad S 提交于
These handlers when called print error message to the kernel log, but the actual handling is done by _c4iw_free_ep() and process_timeout(). Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Hariprasad S 提交于
Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit 7c0a33d6 ("RDMA/cxgb4: Don't wakeup threads for MPAv2") However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race: ->local side sets up an active offload connection. ->local side sends MPA_START request. ->peer sends MPA_START response. ->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE: ->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1. ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR. ->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack... The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect(). Fixes: 7c0a33d6 ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-