- 29 10月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
The block layer can reliably guarantee that SG lists won't contain gaps (page unaligned) if a driver set the queue virt_boundary. With this setting the block layer will: - refuse merges if bios are not aligned to the virtual boundary - split bios/requests that are not aligned to the virtual boundary - or, bounce buffer SG_IOs that are not aligned to the virtual boundary Since iser is working in 4K page size, set the virt_boundary to 4K pages. With this setting, we can now safely remove the bounce buffering logic in iser. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 10月, 2015 2 次提交
-
-
由 Bart Van Assche 提交于
Detected this by compiling with W=1. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Detected this by compiling with W=1. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 22 10月, 2015 2 次提交
-
-
由 Matan Barak 提交于
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: NMatan Barak <matanb@mellanox.com> Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Geliang Tang 提交于
Just fix a typo in the code comment. Signed-off-by: NGeliang Tang <geliangtang@163.com> Acked-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 14 10月, 2015 1 次提交
-
-
由 Christoph Lameter 提交于
When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e0 (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: NChristoph Lameter <cl@linux.com> Tested-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 26 9月, 2015 3 次提交
-
-
由 Doug Ledford 提交于
When performing sendonly joins, we queue the packets that trigger the join until the join completes. This may take on the order of hundreds of milliseconds. It is easy to have many more than three packets come in during that time. Expand the maximum queue depth in order to try and prevent dropped packets during the time it takes to join the multicast group. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christoph Lameter 提交于
On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
This module parameter forces memory registration even for a continuous memory region. It is true by default as sending an all-physical rkey with remote permissions might be insecure. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 16 9月, 2015 7 次提交
-
-
由 Jenny Derzhavetz 提交于
Given that supporting zcopy immediate data for all IOs requires iser driver to use its own buffer allocations, we settle with avoiding data copy for IOs with data length of up to 8K (which is more latency sensitive anyway). This trims IO write latency by up to 3us and increase IOPs by up to 40% by saving CPU time doing sg_copy_from_buffer (8K IO size is the obvious winner here). Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
iser target batches post recv operations to avoid the overhead of acquiring the recv queue lock and posting a HW doorbell for each command. We change it to be per command in order to support zcopy immediate data for IOs that fits in the 8K transfer boundary (in the next patch). (Fix minor patch fuzz due to ib_mr removal - nab) Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
Instead of handing a connection to the iscsi stack for processing right after accepting (rdma_accept) we only hand the connection to the iscsi core after we reached to a connected state (ESTABLISHED CM event). This will prevent two error scenrios: 1. race between rdma connection teardown and iscsi login sequence reported by Nic in: (ce9a9fc2 "iser-target: Fix REJECT CM event use-after-free OOPs") 2. target stack shutdown sequence race with constant login attempts by multiple initiators. We address this by maintaining two queues at the isert_np level: - accepted: connections that were accepted but have not reached connected state (might get rejected, unreachable or error). - pending: connections in connected state, but have yet to handed to the iscsi core for login processing. iser connections are promoted to the pending queue only from the accepted queue. This way the iscsi core now will only handle functional iser connections and once we shutdown the target stack, we look for any stales that got left behind so we can safely release them. Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
These are always referenced from np-> so no need for the prefix. Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
The iscsi target core teardown sequence calls wait_conn for all active commands to finish gracefully by: - move the queue-pair to error state - drain all the completions - wait for the core to finish handling all session commands However, when tearing down a session while there are sequenced commands that are still waiting for unsolicited data outs, we can block forever as these are missing an extra reference put. We basically need the equivalent of iscsit_free_queue_reqs_for_conn() which is called after wait_conn has returned. Address this by an explicit walk on conn_cmd_list and put the extra reference. Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Jenny Derzhavetz 提交于
As documented in iscsit_sequence_cmd: /* * Existing callers for iscsit_sequence_cmd() will silently * ignore commands with CMDSN_LOWER_THAN_EXP, so force this * return for CMDSN_MAXCMDSN_OVERRUN as well.. */ We need to silently finish a command when it's in ISTATE_REMOVE. This fixes an teardown hang we were seeing where a mis-behaved initiator (triggered by allocation error injections) sent us a cmdsn which was lower than expected. Signed-off-by: NJenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
- 04 9月, 2015 3 次提交
-
-
由 Jason Gunthorpe 提交于
We expect send only joins to fail, it just means there are no listeners for the group. The correct thing to do is silently drop the packet at source. Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet to 224.0.0.22, and then a warning level kmessage like this: ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 If there is no IP router listening to IGMP. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
Even though we don't expect the group to be created by the SM we sill need to provide all the parameters to force the SM to validate they are correct. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
srp_destroy_qp is designed to indicate we are safe to continue with freeing the channel resources by modifying the qp error state, posting a dummy wr on the queue-pair and waiting for it to flush. This also holds for the channel registration pool as we are unmapping the memory region when handling a scsi response. Destroying the channel registration pool before we make sure we processed all the inflight IO might introduce a use-after-free of the registration pool. This use-after-free is demonstrated in the stack trace below where srp is trying to unmap a used FMR after the fmr_pool was already destroyed. general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff8151121b>] [<ffffffff8151121b>] _raw_spin_lock_irqsave+0x1b/0x50 Call Trace: [<ffffffffa055d88a>] ib_fmr_pool_unmap+0x1a/0xb0 [ib_core] [<ffffffffa06c00ed>] srp_unmap_data.isra.28+0x17d/0x250 [ib_srp] [<ffffffffa06c01eb>] srp_free_req+0x2b/0x60 [ib_srp] [<ffffffffa06c0c94>] srp_recv_completion+0x174/0x580 [ib_srp] [<ffffffffa04580fe>] mlx4_eq_int+0x4de/0xe50 [mlx4_core] [<ffffffffa0458b00>] mlx4_msi_x_interrupt+0x10/0x20 [mlx4_core] [<ffffffff810abc45>] handle_irq_event_percpu+0x35/0x1b0 [<ffffffff810abdf2>] handle_irq_event+0x32/0x50 [<ffffffff810ae5cf>] handle_edge_irq+0x6f/0x120 [<ffffffff8100455a>] handle_irq+0x1a/0x30 [<ffffffff8151b475>] do_IRQ+0x45/0xb0 [<ffffffff8151162d>] common_interrupt+0x6d/0x6d [<ffffffff813e4d2f>] cpuidle_enter_state+0x4f/0xc0 [<ffffffff813e4e6c>] cpuidle_idle_call+0xcc/0x210 [<ffffffff8100b9ea>] arch_cpu_idle+0xa/0x30 [<ffffffff810ab1e1>] cpu_startup_entry+0xe1/0x270 [<ffffffff81030b3a>] start_secondary+0x21a/0x2c0 Reported-by: NEliott Kespi <eliottk@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 31 8月, 2015 20 次提交
-
-
由 Jason Gunthorpe 提交于
The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
The SRP initiator only needs this if the insecure register_always=N performance optimization is enabled, or if FRWR/FMR is not supported in the driver. Do not create an all physical MR unless it is needed to support either of those modes. Default register_always to true so the out of the box configuration does not create an insecure all physical MR. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> [bvanassche: reworked and rebased this patch] Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Instead of always using the global rkey for the indirect data buffer descriptor, register that descriptor with the HCA if the kernel module parameter register_always has been set to Y. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr / dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are redundant. This patch does not change any functionality but makes the source code easier to read. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Move the srp_map_desc() call from inside srp_map_sg_entry() to srp_map_sg() such that the use_mr argument can be removed from srp_map_sg_entry(). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Mapping a discontiguous sg-list requires multiple memory regions and hence can exhaust the memory region pool. The SRP initiator already handles this by temporarily reducing the queue depth. This means that it is safe to remove the memory registration backtracking code. This patch has been tested with direct I/O sizes up to 256 MB. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Although most paths through which a request is submitted check block layer parameters like the max_segments limit, these are not checked when an SG_IO or direct I/O request is submitted. Hence add a range check for the memory descriptor array pointer. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Instead of using the global rkey for large memory regions, use multiple registrations. See also the while (dma_len) loop further down in srp_map_sg_entry(). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
During a discussion in 2011 nobody recalled why FMR was not used for non-page aligned buffers (see also http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR for such buffers. For the reason why the srp_map_fmr() function needs to be modified, see also patch "IB/srp: rework mapping engine to use multiple FMR entries" (commit ID 8f26c9ff; January 2011). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will have to be fixed separately. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this looks trivially fixed by forcing the use of registration in a future patch. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Chaning of send work requests benefits performance by reducing the send queue lock contention (acquired in ib_post_send) and saves us HW doorbells which is posted only once. Currently, in normal IO flows iser does not chain the CDB send work request with the registration work request. Also in PI flows, signature work requests are not chained as well. Lets chain those and post only once. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Easier to debug when we have the registration details. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
iser support up to 512KB data transfer in a single scsi command. This means that larger IOs will split to different request. While iser can easily saturate FDR/EDR wires, some arrays are fine tuned for 1MB (or larger) IO sizes, hence add an option to support larger transfers (up to 8MB) if the device allows it. Given that a few target implementations don't support data transfers of more than 512KB by default and the fact that larger IO sizes require more resources, we introduce a module parameter to determine the maximum number of 512B sectors in a single scsi command. Users that are interested in larger transfers can change this value given that the target supports larger transfers. At the moment, iser works in 4K pages granularity, In a later stage we will get it to work with system page size instead. IO operations that consists of N pages will need a page vector of size N+1 in case the first SG element contains an offset. Given that some devices allocates memory regions in powers of 2, this means that allocating a region with N+1 pages, will result in region resources allocation of the next power of 2. Since we don't want that to happen, in case we are in the limit of IO size supported and the first SG element has an offset, we align the SG list using a bounce buffer (which is OK given that this is not likely to happen a lot). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Hard coded for now. This will allow to allocate different sized MRs depending on the IO size needed (and device capabilities). This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and logically do the same thing other than the buffer registration method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr). The DIF logic is not implemented in the FMR flow as there is no existing device that supports FMRs and Signature feature. This patch unifies the flow in a single routine iser_reg_rdma_mem and just split to fmr/frwr for the buffer registration itself. Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will call the relevant device specific unreg routine). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
As for fmrs we will hold a single registration descriptor as no need for multiple like in the frwr mode (descriptor for each task). This change helps unifying the duplicate registration code paths. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-