- 02 4月, 2019 1 次提交
-
-
由 Matthew Wilcox 提交于
Also remove hfi1_devs_list. Signed-off-by: NMatthew Wilcox <willy@infradead.org> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 30 3月, 2019 1 次提交
-
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 28 3月, 2019 1 次提交
-
-
由 Bart Van Assche 提交于
Enable format string checking for hfi1_cdbg() and fix the resulting compiler warnings. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 26 3月, 2019 1 次提交
-
-
由 Mike Marciniszyn 提交于
The adaptive PIO implementation only considers the current packet size when deciding between SDMA and pio for a packet. This causes credit return forces if small and large packets are interleaved. Add a running average to avoid costly credit forces so that a large sequence of small packets is required to go below the threshold that chooses pio. Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 07 3月, 2019 1 次提交
-
-
由 Michael J. Ruhl 提交于
When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: stable@vger.kernel.org # v4.14.0+ Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 3月, 2019 1 次提交
-
-
由 Anshuman Khandual 提交于
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 2月, 2019 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Fix the following warning by adding a missing break: drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’: drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=] switch (prev->wr.opcode) { ^~~~~~ drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here case IB_WR_RDMA_READ: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: c6c23117 ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: NKaike Wan <Kaike.wan@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 16 2月, 2019 1 次提交
-
-
由 Kaike Wan 提交于
The following build warning was produced for the TID RDMA READ patch ("IB/hfi1: Enable TID RDMA READ protocol"): drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=] hfi1_setup_tid_rdma_wqe(qp, wqe); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/qp.c:329:2: note: here case IB_QPT_UC: ^~~~ This patch will fix the issue by adding the "fall through" comment. Fixes: f1ab4efa ("IB/hfi1: Enable TID RDMA READ protocol") Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 08 2月, 2019 2 次提交
-
-
由 Davidlohr Bueso 提交于
This driver already uses gup_fast() and thus we can just drop the mmap_sem protection around the pinned_vm counter. Note that the window between when hfi1_can_pin_pages() is called and the actual counter is incremented remains the same as mmap_sem was _only_ used for when ->pinned_vm was touched. Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDavidlohr Bueso <dbueso@suse.det> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Davidlohr Bueso 提交于
Taking a sleeping lock to _only_ increment a variable is quite the overkill, and pretty much all users do this. Furthermore, some drivers (ie: infiniband and scif) that need pinned semantics can go to quite some trouble to actually delay via workqueue (un)accounting for pinned pages when not possible to acquire it. By making the counter atomic we no longer need to hold the mmap_sem and can simply some code around it for pinned_vm users. The counter is 64-bit such that we need not worry about overflows such as rdma user input controlled from userspace. Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NChristoph Lameter <cl@linux.com> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 2月, 2019 30 次提交
-
-
由 Kaike Wan 提交于
ACK packets are generally associated with request completion and resource release and therefore should be sent first. This patch optimizes the send engine by using the following policies: (1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags should have their priority incremented; (2) QPs with ACK or TID-ACK packet queued should have their priority incremented; (3) When a QP is queued to the wait list due to resource constraints, it will be queued to the head if it has ACK packet to send; (4) When selecting qps to run from the wait list, the one with the highest priority and starve_cnt will be selected; each priority will be equivalent to a fixed number of starve_cnt (16). Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA WRITE packets in IB header trace; 2. Adds trace events for various stages of the TID RDMA WRITE protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch enables TID RDMA WRITE protocol by converting a qualified RDMA WRITE request into a TID RDMA WRITE request internally: (1) The TID RDMA cability must be enabled; (2) The request must start on a 4K page boundary; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA WRITE requests are interleaved with TID RDMA READ requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; 4. WRITE-AFTER-WRITE. When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch integrates TID RDMA WRITE protocol into normal RDMA verbs framework. The TID RDMA WRITE protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA WRITE request into a TID RDMA WRITE request to avoid data copying on the responder side. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
The "Second Leg" of the TID RDMA WRITE protocol deals with the transfer of data and ack packets, which are in the KDETH PSN space, as opposed to the IB PSN space. Therefore, the Second Leg could be considered as a separate state machine. As such, it is handled by a different work queue item which is scheduled along with the normal IB state machine work item. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the TID packet builder for the responder side, which contains the state machine to build TID RDMA ACK packet for either TID RDMA WRITE DATA or TID RDMA RESYNC packets. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
To improve performance, the TID RDMA WRITE protocol is designed to own a second leg to send data and ack packets in the KDETH PSN space. This patch adds the packet builder for the requester side, which contains the state machine to build TID RDMA WRITE DATA and TID RDMA RESYNC packet. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the logic to resend TID RDMA WRITE DATA packets. The tracking indices will be reset properly so that the correct TID entries will be used. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to receive TID RDMA RESYNC packet on the responder side. The QP's hardware flow will be updated and all allocated software flows will be updated accordingly in order to drop all stale packets. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to build TID RDMA RESYNC packet, which is sent by the requester to notify the responder that no TID RDMA ACK packet has been received for a given KDETH PSN. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the TID RDMA retry timer to make sure that TID RDMA WRITE DATA packets for a segment are received successfully by the responder. This timer is generally armed when the last TID RDMA WRITE DATA packet for a segment is sent out and stopped when all TID RDMA DATA packets are acknowledged. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to receive TID RDMA ACK packet, which could be an acknowledge to either a TID RDMA WRITE DATA packet or an TID RDMA RESYNC packet. For an ACK to TID RDMA WRITE DATA packet, the request segments are completed appropriately. For an ACK to a TID RDMA RESYNC packet, any pending segment flow information is updated accordingly. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to build TID RDMA ACJ packet, which is also in the KDETH PSN space for packet ordering. This packet is used to acknowledge the receiving of all the TID RDMA WRITE DATA packets before the given KDETH PSN. Similar to RC ACK packets, TID RDMA ACK packets could also be coalesced. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to receive TID RDMA WRITE DATA packet, which is in the KDETH PSN space in packet ordering. Due to the use of header suppression, software is generally only notified when the last data packet for a segment is received. This patch also adds code to handle KDETH EFLAGS errors for ingress TID RDMA WRITE DATA packets. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to build TID RDMA WRITE DATA packet. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds a function to receive TID RDMA WRITE response. The TID entries will be stored for encoding TID RDMA WRITE DATA packet later. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the TID resource timer, which is used by the responder to free any TID resources that are allocated for TID RDMA WRITE request and not returned by the requester after a reasonable time. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the function to build TID RDMA WRITE response. The main role of the TID RDMA WRITE RESP packet is to send TID entries to the requester so that they can be used to encode TID RDMA WRITE DATA packet. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the functions to receive TID RDMA WRITE request. The request will be stored in the QP's s_ack_queue. This patch also adds code to handle duplicate TID RDMA WRITE request and a function to allocate TID resources for data receiving on the responder side. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
The s_ack_queue is managed by two pointers into the ring: r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of where the next received request is going to be placed and s_tail_ack_queue is the entry of the request currently being processed. This works perfectly fine for normal Verbs as the requests are processed one at a time and the s_tail_ack_queue is not moved until the request that it points to is fully completed. In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and the two pointers can easily be used to determine "queue full" and "queue empty" conditions. The detection of these two conditions are imported in determining when an old entry can safely be overwritten with a new received request and the resources associated with the old request be safely released. When pipelined TID RDMA WRITE is introduced into this mix, things look very different. r_head_ack_queue is still the point at which a newly received request will be inserted, s_tail_ack_queue is still the currently processed request. However, with pipelined TID RDMA WRITE requests, s_tail_ack_queue moves to the next request once all TID RDMA WRITE responses for that request have been sent. The rest of the protocol for a particular request is managed by other pointers specific to TID RDMA - r_tid_tail and r_tid_ack - which point to the entries for which the next TID RDMA DATA packets are going to arrive and the request for which the next TID RDMA ACK packets are to be generated, respectively. What this means is that entries in the ring, which are "behind" s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no longer considered complete. This is where the problem is - a newly received request could potentially overwrite a still active TID RDMA WRITE request. The reason why the TID RDMA pointers trail s_tail_ack_queue is that the normal Verbs send engine uses s_tail_ack_queue as the pointer for the next response. Since TID RDMA WRITE responses are processed by the normal Verbs send engine, s_tail_ack_queue had to be moved to the next entry once all TID RDMA WRITE response packets were sent to get the desired pipelining between requests. Doing otherwise would mean that the normal Verbs send engine would not be able to send the TID RDMA WRITE responses for the next TID RDMA request until the current one is fully completed. This patch introduces the s_acked_ack_queue index to point to the next request to complete on the responder side. For requests other than TID RDMA WRITE, s_acked_ack_queue should always be kept in sync with s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind s_tail_ack_queue. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
The TID RDMA WRITE protocol differs from normal IB RDMA WRITE in that TID RDMA WRITE requests do require responses, not just ACKs. Therefore, TID RDMA WRITE requests need to be treated as RDMA READ requests from the point of view of the QPs' s_ack_queue. In other words, the QPs' need to allow for TID RDMA WRITE requests to be stored in their s_ack_queue. However, because the user does not know anything about the TID RDMA capability and/or protocols, these extra entries in the queue cannot be advertized to the user. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the functions to build TID RDMA WRITE request. The work request opcode, packet opcode, and packet formats for TID RDMA WRITE protocol are also defined in this patch. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA READ packets in IB header trace; 2. Tracks qpriv->s_flags and iow_flags in qpsleepwakeup trace; 3. Adds a new event to track RC ACK receiving; 4. Adds trace events for various stages of the TID RDMA READ protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch enables TID RDMA READ protocol by converting a qualified RDMA READ request into a TID RDMA READ request internally: (1) The TID RDMA capability must be enabled; (2) The request must start on a 4K page boundary and all receiving buffers must start on 4K page boundaries; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Each receiving buffer length must be a multiple of 4K. Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch integrates the TID RDMA READ protocol into the IB RC protocol. This protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA READ request into a TID RDMA READ request to avoid data copying on the requester side. The following codes are added in this patch: - Send the TID RDMA READ request; - Complete the TID RDMA READ send request; - Send the TID RDMA READ response; - Complete the TID RDMA READ request; Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds functions to retry TID RDMA READ request. Since TID RDMA READ request could be retried from any segment boundary, it requires a number of tracking fields in various structures and those fields should be reset properly. The qp->s_num_rd_atomic field is reset before retry and therefore should be incremented for each new or retried RDMA READ or atomic request. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This commit adds the TID RDMA READ pointers to the receiving opcode handlers. It also adds TID RDMA READ header sizes to header size table. A function to print the RHF EFLAGS errors is created so that it can be shared by both IB and TID RDMA receiving functions. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
This patch adds the functions to receive TID RDMA READ response. The TID resource information in the KDETH packet header will direct the hardware to deliver the packet payload to the user buffer automatically and the software will handle the packet header for the last packet of a segment as all other packet headers are suppressed by default. The TID entries will be freed when all packets for a segment have been received. This patch also adds the functions to handle KDETH eflag errors, including flow sequence and generation errors, when a TID RDMA READ response packet is received . The flow sequence error can be recovered by software checking of the flow sequence and will disappear when the hardware flow is programmed with a new generation number. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-