- 08 10月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: NHaggai Eran <haggaie@mellanox.com> Tested-by: NSagi Grimberg <sagig@mellanox.com> Tested-by: NSteve Wise <swise@opengridcomputing.com>
-
- 26 9月, 2015 3 次提交
-
-
由 Doug Ledford 提交于
When performing sendonly joins, we queue the packets that trigger the join until the join completes. This may take on the order of hundreds of milliseconds. It is easy to have many more than three packets come in during that time. Expand the maximum queue depth in order to try and prevent dropped packets during the time it takes to join the multicast group. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christoph Lameter 提交于
On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
This module parameter forces memory registration even for a continuous memory region. It is true by default as sending an all-physical rkey with remote permissions might be insecure. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 04 9月, 2015 3 次提交
-
-
由 Jason Gunthorpe 提交于
We expect send only joins to fail, it just means there are no listeners for the group. The correct thing to do is silently drop the packet at source. Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet to 224.0.0.22, and then a warning level kmessage like this: ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 If there is no IP router listening to IGMP. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
Even though we don't expect the group to be created by the SM we sill need to provide all the parameters to force the SM to validate they are correct. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
srp_destroy_qp is designed to indicate we are safe to continue with freeing the channel resources by modifying the qp error state, posting a dummy wr on the queue-pair and waiting for it to flush. This also holds for the channel registration pool as we are unmapping the memory region when handling a scsi response. Destroying the channel registration pool before we make sure we processed all the inflight IO might introduce a use-after-free of the registration pool. This use-after-free is demonstrated in the stack trace below where srp is trying to unmap a used FMR after the fmr_pool was already destroyed. general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff8151121b>] [<ffffffff8151121b>] _raw_spin_lock_irqsave+0x1b/0x50 Call Trace: [<ffffffffa055d88a>] ib_fmr_pool_unmap+0x1a/0xb0 [ib_core] [<ffffffffa06c00ed>] srp_unmap_data.isra.28+0x17d/0x250 [ib_srp] [<ffffffffa06c01eb>] srp_free_req+0x2b/0x60 [ib_srp] [<ffffffffa06c0c94>] srp_recv_completion+0x174/0x580 [ib_srp] [<ffffffffa04580fe>] mlx4_eq_int+0x4de/0xe50 [mlx4_core] [<ffffffffa0458b00>] mlx4_msi_x_interrupt+0x10/0x20 [mlx4_core] [<ffffffff810abc45>] handle_irq_event_percpu+0x35/0x1b0 [<ffffffff810abdf2>] handle_irq_event+0x32/0x50 [<ffffffff810ae5cf>] handle_edge_irq+0x6f/0x120 [<ffffffff8100455a>] handle_irq+0x1a/0x30 [<ffffffff8151b475>] do_IRQ+0x45/0xb0 [<ffffffff8151162d>] common_interrupt+0x6d/0x6d [<ffffffff813e4d2f>] cpuidle_enter_state+0x4f/0xc0 [<ffffffff813e4e6c>] cpuidle_idle_call+0xcc/0x210 [<ffffffff8100b9ea>] arch_cpu_idle+0xa/0x30 [<ffffffff810ab1e1>] cpu_startup_entry+0xe1/0x270 [<ffffffff81030b3a>] start_secondary+0x21a/0x2c0 Reported-by: NEliott Kespi <eliottk@mellanox.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 31 8月, 2015 32 次提交
-
-
由 Jason Gunthorpe 提交于
The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
The SRP initiator only needs this if the insecure register_always=N performance optimization is enabled, or if FRWR/FMR is not supported in the driver. Do not create an all physical MR unless it is needed to support either of those modes. Default register_always to true so the out of the box configuration does not create an insecure all physical MR. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> [bvanassche: reworked and rebased this patch] Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Instead of always using the global rkey for the indirect data buffer descriptor, register that descriptor with the HCA if the kernel module parameter register_always has been set to Y. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr / dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are redundant. This patch does not change any functionality but makes the source code easier to read. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Move the srp_map_desc() call from inside srp_map_sg_entry() to srp_map_sg() such that the use_mr argument can be removed from srp_map_sg_entry(). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Mapping a discontiguous sg-list requires multiple memory regions and hence can exhaust the memory region pool. The SRP initiator already handles this by temporarily reducing the queue depth. This means that it is safe to remove the memory registration backtracking code. This patch has been tested with direct I/O sizes up to 256 MB. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Although most paths through which a request is submitted check block layer parameters like the max_segments limit, these are not checked when an SG_IO or direct I/O request is submitted. Hence add a range check for the memory descriptor array pointer. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
Instead of using the global rkey for large memory regions, use multiple registrations. See also the while (dma_len) loop further down in srp_map_sg_entry(). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
During a discussion in 2011 nobody recalled why FMR was not used for non-page aligned buffers (see also http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR for such buffers. For the reason why the srp_map_fmr() function needs to be modified, see also patch "IB/srp: rework mapping engine to use multiple FMR entries" (commit ID 8f26c9ff; January 2011). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will have to be fixed separately. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
Replace all leys with pd->local_dma_lkey. This driver does not support iWarp, so this is safe. The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this looks trivially fixed by forcing the use of registration in a future patch. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Chaning of send work requests benefits performance by reducing the send queue lock contention (acquired in ib_post_send) and saves us HW doorbells which is posted only once. Currently, in normal IO flows iser does not chain the CDB send work request with the registration work request. Also in PI flows, signature work requests are not chained as well. Lets chain those and post only once. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Easier to debug when we have the registration details. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
iser support up to 512KB data transfer in a single scsi command. This means that larger IOs will split to different request. While iser can easily saturate FDR/EDR wires, some arrays are fine tuned for 1MB (or larger) IO sizes, hence add an option to support larger transfers (up to 8MB) if the device allows it. Given that a few target implementations don't support data transfers of more than 512KB by default and the fact that larger IO sizes require more resources, we introduce a module parameter to determine the maximum number of 512B sectors in a single scsi command. Users that are interested in larger transfers can change this value given that the target supports larger transfers. At the moment, iser works in 4K pages granularity, In a later stage we will get it to work with system page size instead. IO operations that consists of N pages will need a page vector of size N+1 in case the first SG element contains an offset. Given that some devices allocates memory regions in powers of 2, this means that allocating a region with N+1 pages, will result in region resources allocation of the next power of 2. Since we don't want that to happen, in case we are in the limit of IO size supported and the first SG element has an offset, we align the SG list using a bounce buffer (which is OK given that this is not likely to happen a lot). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Hard coded for now. This will allow to allocate different sized MRs depending on the IO size needed (and device capabilities). This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and logically do the same thing other than the buffer registration method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr). The DIF logic is not implemented in the FMR flow as there is no existing device that supports FMRs and Signature feature. This patch unifies the flow in a single routine iser_reg_rdma_mem and just split to fmr/frwr for the buffer registration itself. Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will call the relevant device specific unreg routine). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
As for fmrs we will hold a single registration descriptor as no need for multiple like in the frwr mode (descriptor for each task). This change helps unifying the duplicate registration code paths. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Also, change a name of a local variable. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Adir Lev 提交于
This will allow us to unify the memory registration code path between the various methods which vary by the device capabilities. This change will make it easier and less intrusive to remove fmr_pools from the code when we'd want to. The reason we use a single descriptor is to avoid taking a redundant spinlock when working with FMRs. We also change the signature of iser_reg_page_vec to make it match iser_fast_reg_mr (and the future indirect registration method). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Instead of having it a part of the connection structure, have it be under a dedicated (embedded) structure in the connection. A logical separation of the registration pool and the connection structure. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Don't have the caller allocate the structure and worry about freeing it in case the routine failed. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Move all the per-device function pointers to an easy extensible iser_reg_ops structure that contains all the iser registration operations. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
In the past the we always tried to allocate an fmr_pool and if it failed on ENOSYS (not supported) then we continued with dma mr. This is not the case anymore and if we tried to allocate an fmr_pool then it is supported and we expect to succeed. Also, the check if fmr_pool is allocated when free is called is redundant as well as we are guaranteed it exists. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Avoid struct names without iser_ prefix. This patch does not change any functionality. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Have fast_reg_descriptor hold struct iser_reg_resources (mr, frpl, valid flag). This will be useful when the actual buffer registration routines will be passed with the needed registration resources (i.e. iser_reg_resources) without being aware of their nature (i.e. data or protection). In order to achieve this, we remove reg_indicators flags container and place specific flags (mr_valid) within iser_reg_resources struct. We also place the sig_mr_valid and sig_protcted flags in iser_pi_context. This patch also modifies iser_fast_reg_mr to receive the reg_resources instead of the fast_reg_descriptor and a data/protection indicator. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAdir Lev <adirl@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
We can do it in iser_aligned_data_len instead and it will save us an argument that is passed to fall_to_counce_buf just for the print. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
We always call iser_initialize_task_headers() and set the header tx_sg.lkey to the device mr lkey, so no point in checking it in iser_create_send_desc(). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
If iser_initialize_task_headers() routine failed before dma mapping, we should not attempt to unmap in cleanup_task(). Fixes: 7414dde0 (IB/iser: Fix race between iser connection ...) Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
We don't update those anywhere in the code and they seem pretty useless (no one seem to care about those). qp_tx_queue_full: We never should get this fmr_map_not_avail: We can never get to this eh_abort_cnt: We don't monitor aborts Go ahead and remove them. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-