提交 · 838b6fd2d9ca29998869e4d1ecf4566efe807666 · openeuler / Kernel

06 2月, 2019 2 次提交

IB/hfi1: TID RDMA RcvArray programming and TID allocation · 838b6fd2

由 Kaike Wan 提交于 1月 23, 2019

TID entries are used by hfi1 hardware to receive data payload from
incoming packets directly into a user buffer and thus avoid data copying
by software. This patch implements the functions for TID allocation,
freeing, and programming TID RcvArray entries in hardware for kernel
clients. TID entries are managed via lists of TID groups similar to PSM.
Furthermore, to track TID resource allocation for each request, software
flows are also allocated and freed as needed. Since software flows
consume large amount of memory for tracking TID allocation and freeing,
it is generally desirable to allocate them dynamically in the send queue
and only for TID RDMA requests, but pre-allocate them for receive queue
because the send queue could have thousands of entries while the receive
queue has only a limited number of entries.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

838b6fd2

IB/hfi1: TID RDMA flow allocation · 37356e78

由 Kaike Wan 提交于 2月 05, 2019

The hfi1 hardware flow is a hardware flow-control mechanism for a KDETH
data packet that is received on a hfi1 port. It validates the packet by
checking both the generation and sequence. Each QP that uses the TID RDMA
mechanism will allocate a hardware flow from its receiving context for
any incoming KDETH data packets.

This patch implements:
(1) a function to allocate hardware flow
(2) a function to free hardware flow
(3) a function to initialize hardware flow generation for a receiving
    context
(4) a wait mechanism if the hardware flow is not available
(4) a function to remove the qp from the wait queue for hardware flow
    when the qp is reset or destroyed.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

37356e78

01 2月, 2019 2 次提交

IB/hfi1: Add OPFN helper functions for TID RDMA feature · d22a207d

由 Kaike Wan 提交于 1月 23, 2019

This patch adds the OPFN helper functions to initialize, encode, decode,
and reset OPFN parameters for the TID RDMA feature.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d22a207d

IB/hfi1: OPFN support discovery · 44e43d91

由 Mitko Haralanov 提交于 1月 24, 2019

OPFN (Omni Path Feature Negotiation) support discovery allows a RC QP to
announce that it supports OPFN and also discover if OPFN is supported by
the peer QP. OPFN parameter negotiation is skipped unless OPFN support is
first discovered. OPFN support is announced by claiming what was
the reserved bit in dword 1 of OmniPath modified base transport header
in requests and responses.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

44e43d91

07 12月, 2018 1 次提交

IB/hfi1: Allow the driver to initialize QP priv struct · 5190f052

由 Mike Marciniszyn 提交于 11月 28, 2018

This patch adds an interface to allow the driver to initialize the QP priv
struct when the QP is created and after the qpn has been assigned. A
field is added to the QP priv struct to reference the rcd and two new
files are added to contain the function to initialize the rcd field so
that more TID RDMA related code can be added here later.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5190f052

04 10月, 2018 2 次提交

IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavt · 116aa033

由 Venkata Sandeep Dhanalakota 提交于 9月 26, 2018

Moving send completion code into rdmavt in order to have shared logic
between qib and hfi1 drivers.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

116aa033

IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavt · 019f118b

由 Brian Welty 提交于 9月 26, 2018

This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
This patch also moves all the wss_*() functions into rdmavt as
several wss_*() functions are called from hfi1_copy_sge()

When SGE copy mode is adaptive, cacheless copy may be done in some cases
for performance reasons. In those cases, X86 cacheless copy function
is called since the drivers that use rdmavt and may set SGE copy mode
to adaptive are X86 only. For this reason, this patch adds
"depends on X86_64" to rdmavt/Kconfig.
Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

019f118b

01 10月, 2018 2 次提交

IB/hfi1: Prepare resource waits for dual leg · 5da0fc9d

由 Dennis Dalessandro 提交于 9月 28, 2018

Current implementation allows each qp to have only one send engine.  As
such, each qp has only one list to queue prebuilt packets when send engine
resources are not available. To improve performance, it is desired to
support multiple send engines for each qp.

This patch creates the framework to support two send engines
(two legs) for each qp for the TID RDMA protocol, which can be easily
extended to support more send engines. It achieves the goal by creating a
leg specific struct, iowait_work in the iowait struct, to hold the
work_struct and the tx_list as well as a pointer to the parent iowait
struct.

The hfi1_pkt_state now has an additional field to record the current legs
work structure and that is now passed to all egress waiters to determine
the leg that needs to wait via a new iowait helper.  The APIs are adjusted
to use the new leg specific struct as required.

Many new and modified helpers are added to support this change.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

5da0fc9d

IB/rdmavt: Rename check_send_wqe as setup_wqe · d205a06a

由 Kaike Wan 提交于 9月 26, 2018

The driver-provided function check_send_wqe allows the hardware driver to
check and set up the incoming send wqe before it is inserted into the swqe
ring. This patch will rename it as setup_wqe to better reflect its
usage. In addition, this function is only called when all setup is
complete in rdmavt.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d205a06a

11 9月, 2018 1 次提交

IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting · 0b79b277

由 Michael J. Ruhl 提交于 9月 10, 2018

The post_send() path determines if it should post directly or, schedule
the post for later.  The current logic is:

  if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold
    post the send
  else
    schedule

This can allow large requests to call the send engine directly.  Large
requests can potentially produce a large number of packets prior to
returning to the caller, blocking the caller from posting more requests,
and allowing better parallel processing.

Allow the driver(s) more say in this logic (pass call_send to the driver,
rather than examining a return value).

Update hfi1/qib logic to schedule the send engine if an RC or UC message
is larger than the QP MTU size.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

0b79b277

24 5月, 2018 1 次提交

IB/hfi1: Define 16B Management Packets · 4171a693

由 Don Hiatt 提交于 5月 15, 2018

Add 16B Management Packet definition. This optimized packet
format replaces the ib_other_headers and BTH with a source
and destination QP number.

To support these packets we introduce struct opa_16b_mgmt
into the struct hfi1_16b_header.

This packet format is only used for MAD packets using the
IB_OPCODE_UD_SEND_ONLY opcode on QP0/1.

The original 16B implementation failed to use 16B management
packets so now we add their definition.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4171a693

10 5月, 2018 2 次提交

IB/{hfi1, qib, rdmavt}: Move logic to allocate receive WQE into rdmavt · 832369fa

由 Brian Welty 提交于 5月 02, 2018

Moving receive-side WQE allocation logic into rdmavt will allow
further code reuse between qib and hfi1 drivers.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

832369fa

IB/hfi1: Rework fault injection machinery · a74d5307

由 Mitko Haralanov 提交于 5月 02, 2018

The packet fault injection code present in the HFI1 driver had some
issues which not only fragment the code but also created user
confusion. Furthermore, it suffered from the following issues:

1. The fault_packet method only worked for received packets. This
meant that the only fault injection mode available for sent
packets is fault_opcode, which did not allow for random packet
drops on all egressing packets.
2. The mask available for the fault_opcode mode did not really work
due to the fact that the opcode values are not bits in a bitmask but
rather sequential integer values. Creating a opcode/mask pair that
would successfully capture a set of packets was nearly impossible.
3. The code was fragmented and used too many debugfs entries to
operate and control. This was confusing to users.
4. It did not allow filtering fault injection on a per direction basis -
egress vs. ingress.

In order to improve or fix the above issues, the following changes have
been made:

1. The fault injection methods have been combined into a single fault
injection facility. As such, the fault injection has been plugged
into both the send and receive code paths. Regardless of method used
the fault injection will operate on both egress and ingress packets.
2. The type of fault injection - by packet or by opcode - is now controlled
by changing the boolean value of the file "opcode_mode". When the value
is set to True, fault injection is done by opcode. Otherwise, by
packet.
2. The masking ability has been removed in favor of a bitmap that holds
opcodes of interest (one bit per opcode, a total of 256 bits). This
works in tandem with the "opcode_mode" value. When the value of
"opcode_mode" is False, this bitmap is ignored. When the value is
True, the bitmap lists all opcodes to be considered for fault injection.
By default, the bitmap is empty. When the user wants to filter by opcode,
the user sets the corresponding bit in the bitmap by echo'ing the bit
position into the 'opcodes' file. This gets around the issue that the set
of opcodes does not lend itself to effective masks and allow for extremely
fine-grained filtering by opcode.
4. fault_packet and fault_opcode methods have been combined. Hence, there
is only one debugfs directory controlling the entire operation of the
fault injection machinery. This reduces the number of debugfs entries
and provides a more unified user experience.
5. A new control files - "direction" - is provided to allow the user to
control the direction of packets, which are subject to fault injection.
6. A new control file - "skip_usec" - is added that would allow the user
to specify a "timeout" during which no fault injection will occur.

In addition, the following bug fixes have been applied:

1. The fault injection code has been split into its own header and source
files. This was done to better organize the code and support conditional
compilation without littering the code with #ifdef's.
2. The method by which the TX PIO packets were being marked for drop
conflicted with the way send contexts were being setup. As a result,
the send context was repeatedly being reset.
3. The fault injection only makes sense when the user can control it
through the debugfs entries. However, a kernel configuration can
enable fault injection but keep fault injection debugfs entries
disabled. Therefore, it makes sense that the HFI fault injection
code depends on both.
4. Error suppression did not take into account the method by which PIO
packets were being dropped. Therefore, even with error suppression
turned on, errors would still be displayed to the screen. A larger
enough packet drop percentage would case the kernel to crash because
the driver would be stuck printing errors.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a74d5307

02 2月, 2018 4 次提交

IB/hfi1: Remove blind constants from 16B update · 78d3633b

由 Mike Marciniszyn 提交于 2月 01, 2018

These values were introduced as part of the 16B code to
account for the varying size of the LRH between the differing
packet formats.

Replace the blind constants with defines based on FIELD_SIZEOF()
calls.

Fixes: 5b6cabb0 ("IB/hfi1: Add 16B RC/UC support")
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

78d3633b

IB/hfi1: Look up ibport using a pointer in receive path · bdaf96f6

由 Sebastian Sanchez 提交于 2月 01, 2018

In the receive path, hfi1_ibport is looked up by indexing into an
array. A profile shows this to be expensive. The receive context
data has a pointer to the ibport data, use that pointer instead.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bdaf96f6

IB/hfi1: Optimize packet type comparison using 9B and bypass code paths · 6d6b8848

由 Sebastian Sanchez 提交于 2月 01, 2018

The packet type comparison used to find out if a packet is a bypass
packet in the hot path is an expensive operation as seen in a profile.

Determine packet's pkey and migration bit through the bypass and 9B
code paths instead.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6d6b8848

IB/hfi1: Remove dependence on qp->s_hdrwords · 9636258f

由 Mitko Haralanov 提交于 2月 01, 2018

The s_hdrwords variable was used to indicate whether a
packet was already built on a previous iteration of the
send engine. This variable assumed the protection of the
QP's RVT_S_BUSY flag, which was required since the the
QP's s_lock was dropped just prior to the packet being
queued on the one of the egress mechanisms.

Support for multiple send engine instantiations require
that the field not be used due to concurency issues.
The ps.txreq signals the "already built" without the
potential concurency issues.

Fix by getting rid of all s_hdrword usage.   A wrapper
is added to test for the already built case that used to
use s_hdrwords.

What used to be stored in s_hdrwords is now in the txreq.
The PBC is not counted, but is added in the pio/sdma code
paths prior to posting the packet.
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9636258f

23 8月, 2017 9 次提交

IB/hfi1: Remove HFI1_VERBS_31BIT_PSN option · 61654679

由 Grzegorz Morys 提交于 8月 13, 2017

Remove HFI1_VERBS_31BIT_PSN Kconfig option leaving only 31-bit PSNs
available. The option was implemented in the early days of the driver
and is no longer needed.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NGrzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

61654679

IB/hfi1: Enhance PIO/SDMA send for 16B · 566d53a8

由 Don Hiatt 提交于 8月 04, 2017

PIO/SDMA send logic now uses the hdr_type field to determine
the type of packet that has been constructed. Based on the hdr_type,
certain things such as PBC flags, padding count and the LT extra
trailing bytes are determined.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

566d53a8

IB/hfi1: Add 16B RC/UC support · 5b6cabb0

由 Don Hiatt 提交于 8月 04, 2017

Add 16B bypass packet support for RC/UC traffic types.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5b6cabb0

IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lids · 51e658f5

由 Dasaratharaman Chandramouli 提交于 8月 04, 2017

Increase lid used in hfi1 driver to 32 bits. qib continues
to use 16 bit lids.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

51e658f5

IB/hfi1: Add 16B UD support · 88733e3b

由 Don Hiatt 提交于 8月 04, 2017

Add 16B bypass packet support for UD traffic types.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

88733e3b

IB/hfi1: Determine 9B/16B L2 header type based on Address handle · d98bb7f7

由 Don Hiatt 提交于 8月 04, 2017

When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d98bb7f7

IB/hfi1: Add support to process 16B header errors · 5786adf3

由 Don Hiatt 提交于 8月 04, 2017

Enhance hdr_rcverr() to also handle errors during
16B bypass packet receive.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5786adf3

IB/hfi1: Add support to send 16B bypass packets · 30e07416

由 Don Hiatt 提交于 8月 04, 2017

We introduce struct hfi1_opa_header as a union
of ib (9B) and 16B headers.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

30e07416

IB/hfi1: Add support to receive 16B bypass packets · 72c07e2b

由 Don Hiatt 提交于 8月 04, 2017

We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

72c07e2b

01 8月, 2017 1 次提交

IB/hfi1: Serve the most starved iowait entry first · bcad2913

由 Kaike Wan 提交于 7月 24, 2017

When an egress resource(SDMA descriptors, pio credits) is not available,
a sending thread will be put on the resource's wait queue. When the
resource becomes available again, up to a fixed number of sending threads
can be awakened sequentially and removed from the wait queue, depending
on the number of waiting threads and the number of free resources. Since
each awakened sending thread will send as many packets as possible, it
is highly likely that the first sending thread will consume all the
egress resources. Subsequently, it will be put back to the end of the wait
queue. Depending on the timing when the later sending threads wake up,
they may not be able to send any packet and be again put back to the end
of the wait queue sequentially, right behind the first sending thread.
This starvation cycle continues until some sending threads exceed their
retry limit and consequently fail.

This patch fixes the issue by two simple approaches:
(1) Any starved sending thread will be put to the head of the wait queue
while a served sending thread will be put to the tail;
(2) The most starved sending thread will be served first.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bcad2913

28 6月, 2017 2 次提交

IB/hfi1,qib: Do not send QKey trap for UD qps · 13d84914

由 Dennis Dalessandro 提交于 5月 29, 2017

According to IBTA spec a QKey violation should not result in a bad qkey
trap being triggered for UD queue pairs. Also since it is a silent error
we do not increment the q_key violation or the dropped packet counters.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

13d84914

IB/hfi1: Setup common IB fields in hfi1_packet struct · 9039746c

由 Don Hiatt 提交于 5月 12, 2017

We move many common IB fields into the hfi1_packet structure and
set them up in a single function. This allows us to set the fields
in a single place and not deal with them throughout the driver.
Reviewed-by: NBrian Welty <brian.welty@intel.com>
Reviewed-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9039746c

05 5月, 2017 2 次提交

IB/hfi1: Fix yield logic in send engine · dd1ed108

由 Mike Marciniszyn 提交于 5月 04, 2017

When there are many RC QPs and an RDMA READ request
is sent, timeouts occur on the requester side because
of fairness among RC QPs on their relative SDMA engine
on the responder side.  This also hits write and send, but
to a lesser extent.

Complicating the issue is that the current code checks if workqueue
is congested before scheduling other QPs, however, this
check is based on the number of active entries in the
workqueue, which was found to be too big to for
workqueue_congested() to be effective.

Fix by reducing the number of active entries as revealed by
experimentation from the default of num_sdma to
HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES.  Retry counts were monitored
to determine the correct value.

Tracing to investigate any future issues is also added.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

dd1ed108

IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line · 688f21c0

由 Mike Marciniszyn 提交于 5月 04, 2017

This field is causing excessive cache line bouncing.

There are spare bytes in the r_lock cache line so the best approach
is to make an rvt QP field and remove from the hfi1 priv field.
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

688f21c0

02 5月, 2017 2 次提交

IB/core: Use rdma_ah_attr accessor functions · d8966fcd

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d8966fcd

IB/core: Rename struct ib_ah_attr to rdma_ah_attr · 90898850

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

90898850

29 4月, 2017 1 次提交

IB/hfi1: Prevent kernel QP post send hard lockups · b6eac931

由 Mike Marciniszyn 提交于 4月 09, 2017

The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.

If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.

Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread.   If the routine is not
running in a thread, avoid calling cond_resched().

CC: <stable@vger.kernel.org> # 4.7.x-
Fixes: Commit 831464ce ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b6eac931

06 4月, 2017 2 次提交

IB/hfi1: Add transmit fault injection feature · 243d9f43

由 Don Hiatt 提交于 3月 20, 2017

Add ability to fault packets on transmit by opcode.
Dropping by packet can be achieved by setting the mask to 0.

In order to drop non-verbs traffic we set PbcInsertHrc
to NONE (0x2). The packet will still be delivered to
the receiving node but a KHdrHCRCErr (KDETH packet
with a bad HCRC) will be triggered and the packet will
not be delivered to the correct context.

In order to drop regular verbs traffic we set the
PbcTestEbp flag. The packet will still be delivered
to the receiving node but a 'late ebp error' will
be triggered and will be dropped.

A global toggle (/sys/kernel/debug/hfi1/hfi1_X/fault_suppress_err)
has been added to suppress the error messages on the receive
node when a packet was faulted on the sending node.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

243d9f43

IB/hfi1: Add receive fault injection feature · 0181ce31

由 Don Hiatt 提交于 3月 20, 2017

Add fault injection capability:
  - Drop packets unconditionally (fault_by_packet)
  - Drop packets based on opcode (fault_by_opcode)

This feature reacts to the global FAULT_INJECTION
config flag.

The faulting traces have been added:
  - misc/fault_opcode
  - misc/fault_packet

See 'Documentation/fault-injection/fault-injection.txt'
for details.

Examples:
  - Dropping packets by opcode:
    /sys/kernel/debug/hfi1/hfi1_X/fault_opcode
	# Enable fault
	echo Y > fault_by_opcode
	# Setprobability of dropping (0-100%)
	# echo 25 > probability
	# Set opcode
	echo 0x64 > opcode
	# Number of times to fault
	echo 3 > times
	# An optional mask allows you to fault
	# a range of opcodes
	echo 0xf0 > mask
    /sys/kernel/debug/hfi1/hfi1_X/fault_stats
    contains a value in parentheses to indicate
    number of each opcode dropped.

  - Dropping packets unconditionally
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet
	# Enable fault
	echo Y > fault_by_packet
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet/fault_stats
    contains the number of packets dropped.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0181ce31

19 2月, 2017 4 次提交

IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt · 1198fcea

由 Brian Welty 提交于 2月 08, 2017

To improve code reuse, add small SGE state helper routines to rdmavt_mr.h.
Leverage these in hfi1, including refactoring of hfi1_copy_sge.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1198fcea

IB/hfi1, rdmavt: Update copy_sge to use boolean arguments · 0128fcea

由 Brian Welty 提交于 2月 08, 2017

Convert copy_sge and related SGE state functions to use boolean.
For determining if QP is in user mode, add helper function in rdmavt_qp.h.
This is used to determine if QP needs the last byte ordering.
While here, change rvt_pd.user to a boolean.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0128fcea

IB/hfi1: Use new rdmavt timers · 56acbbfb

由 Venkata Sandeep Dhanalakota 提交于 2月 08, 2017

Reduce hfi1 code footprint by using the rdmavt timers.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

56acbbfb

IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavt · 696513e8

由 Brian Welty 提交于 2月 08, 2017

Add rvt_compute_aeth() and rvt_get_credit() as shared functions in
rdmavt, moved from hfi1/qib logic.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NBrian Welty <brian.welty@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

696513e8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功