提交 · ad0371482b1ed599a0e9f8cce50e8499c7c1497b · openanolis / cloud-kernel

21 9月, 2018 2 次提交

IB/hfi1: Fix destroy_qp hang after a link down · b4a4957d

由 Michael J. Ruhl 提交于 9月 20, 2018

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 77241056 ("IB/hfi1: add driver files")
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b4a4957d

IB/hfi1: Fix context recovery when PBC has an UnsupportedVL · d623500b

由 Michael J. Ruhl 提交于 9月 20, 2018

If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred.  This will cause the packet stream to block.

HFI has 8 virtual lanes available for packet streams.  Each lane can
be enabled or disabled using the UnsupportedVL mask.  If a lane is
disabled, adding a packet to the send context must be disallowed.

The current mask for determining unsupported VLs defaults to 0 (allow
all).  This is incorrect.  Only the VLs that are defined should be
allowed.

Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask.  The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 77241056 ("IB/hfi1: add driver files")
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d623500b

22 6月, 2018 1 次提交

IB/hfi1: Remove caches of chip CSRs · 06e81e3e

由 Mike Marciniszyn 提交于 6月 20, 2018

Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

06e81e3e

20 6月, 2018 1 次提交

IB/rdmavt, IB/hfi1: Create device dependent s_flags · 2e2ba09e

由 Mike Marciniszyn 提交于 6月 04, 2018

Move some s_flags defines out of rdmavt and into hfi1 because they are
hfi1 specific and therefore should remain in the driver instead of
bubbling up to rdmavt.

Document device specific ranges in rdmavt and remap
those in hfi1.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2e2ba09e

10 5月, 2018 1 次提交

IB/hfi1: Reorder incorrect send context disable · a93a0a31

由 Michael J. Ruhl 提交于 5月 02, 2018

User send context integrity bits are cleared before the context is
disabled.  If the send context is still processing data, any packets
that need those integrity bits will cause an error and halt the send
context.

During the disable handling, the driver waits for the context to drain.
If the context is halted, the driver will eventually timeout because
the context won't drain and then incorrectly bounce the link.

Reorder the bit clearing and the context disable.

Examine the software state and send context status as well as the
egress status to determine if a send context is in the halted state.

Promote the check macros to static functions for consistency with the
new check and to follow kernel style.

Remove an unused define that refers to the egress timeout.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a93a0a31

02 2月, 2018 1 次提交

IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node · 953a9ceb

由 Kamenee Arumugam 提交于 2月 01, 2018

Kzalloc_node API doesn't check for overflows in size multiplication.
While kcalloc API check for overflows in size multiplication
but these implementations are not NUMA-aware.

This conversion allowed for correcting an allocation used in the hot
path to be on the local NUMA and ensure us overflow free multiplication
for the size of a memory allocation.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

953a9ceb

14 11月, 2017 1 次提交

IB/hfi1: Do not allocate PIO send contexts for VNIC · cc9a97ea

由 Niranjana Vishwanathapura 提交于 11月 06, 2017

OPA VNIC does not use PIO contexts and instead only uses SDMA
engines. Do not allocate PIO contexts for VNIC ports.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cc9a97ea

25 10月, 2017 1 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

01 8月, 2017 2 次提交

IB/hfi1: Create workqueue for link events · 71d47008

由 Sebastian Sanchez 提交于 7月 29, 2017

Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

71d47008

IB/hfi1: Serve the most starved iowait entry first · bcad2913

由 Kaike Wan 提交于 7月 24, 2017

When an egress resource(SDMA descriptors, pio credits) is not available,
a sending thread will be put on the resource's wait queue. When the
resource becomes available again, up to a fixed number of sending threads
can be awakened sequentially and removed from the wait queue, depending
on the number of waiting threads and the number of free resources. Since
each awakened sending thread will send as many packets as possible, it
is highly likely that the first sending thread will consume all the
egress resources. Subsequently, it will be put back to the end of the wait
queue. Depending on the timing when the later sending threads wake up,
they may not be able to send any packet and be again put back to the end
of the wait queue sequentially, right behind the first sending thread.
This starvation cycle continues until some sending threads exceed their
retry limit and consequently fail.

This patch fixes the issue by two simple approaches:
(1) Any starved sending thread will be put to the head of the wait queue
while a served sending thread will be put to the tail;
(2) The most starved sending thread will be served first.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bcad2913

21 4月, 2017 1 次提交

IB/hfi1: Virtual Network Interface Controller (VNIC) HW support · 2280740f

由 Vishwanathapura, Niranjana 提交于 4月 12, 2017

HFI1 HW specific support for VNIC functionality.
Dynamically allocate a set of contexts for VNIC when the first vnic
port is instantiated. Allocate VNIC contexts from user contexts pool
and return them back to the same pool while freeing up. Set aside
enough MSI-X interrupts for VNIC contexts and assign them when the
contexts are allocated. On the receive side, use an RSM rule to
spread TCP/UDP streams among VNIC contexts.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NAndrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2280740f

12 12月, 2016 2 次提交

IB/hfi1: Avoid credit return allocation for cpu-less NUMA nodes · 9d8145a6

由 Harish Chegondi 提交于 12月 07, 2016

Do not allocate credit return base and DMA memory for
NUMA nodes without CPUs.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9d8145a6

IB/hfi1: Remove critical section gap in sc_buffer_alloc() · 9b86071c

由 Sebastian Sanchez 提交于 12月 07, 2016

In sc_buffer_alloc(), the sc->alloc_lock is released
before calling sc_release_update(), and it is reacquired
after the function call. This causes CPU lock trading.
Fix it by not dropping the lock before calling
sc_release_update().
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9b86071c

04 12月, 2016 1 次提交

IB/hfi1: Remove debug prints after allocation failure · 5ce9f115

由 Leon Romanovsky 提交于 11月 03, 2016

The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5ce9f115

16 11月, 2016 4 次提交

IB/hfi1: Optimize pio_buf and send_context structs · 8af8d297

由 Sebastian Sanchez 提交于 10月 25, 2016

Both pio_buf and send_context structs have oversized
fields and have cachelines that can be optimized.

Reduce oversized fields for both structs.
Make sure pio_buf struct fits within a cacheline.
Move read-only fields to their own cacheline in
send_context struct.

All of this will avoid cacheline trading as the ring
progresses and pio buffers/send contexts are used.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8af8d297

IB/hfi1: Get rid of divide in pio buffer allocator · 2474d775

由 Sebastian Sanchez 提交于 10月 25, 2016

The div instruction shows costly in profiles.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2474d775

IB/hfi1: Add unique txwait_lock for txreq events · 4e045572

由 Mike Marciniszyn 提交于 10月 10, 2016

Profiling suggests that the read_seqbegin() in
the txreq put logic is colliding with other uses
of the iowait lock.

The packet at a time use of this lock dictates a unique
lock to avoid reader/writer collisions when the number
of vTxWait events is low.

In order to support a unique lock the iowait struct embedded
in the QP is extended to remember the lock that protects the queue
head.

The QP destroy removes that QP from any wait list.  It doesn't
need to know the head because of the linked list API, but it does
need to know the lock required to protect the head.

This also opens up the wait logic to have unique per resources locks
which needs to be in future refinement.
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4e045572

IB/hfi1: Fix integrity check flags default values · d9ac4555

由 Jakub Pawlak 提交于 10月 10, 2016

Prevent setting up integrity check flags when module is loaded
with NO_INTEGRITY capability.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NJakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d9ac4555

02 10月, 2016 1 次提交

IB/hfi1: Fix user-space buffers mapping with IOMMU enabled · 60368186

由 Tymoteusz Kielan 提交于 9月 06, 2016

The dma_XXX API functions return bus addresses which are
physical addresses when IOMMU is disabled. Buffer
mapping to user-space is done via remap_pfn_range() with PFN
based on bus address instead of physical. This results in
wrong pages being mapped to user-space when IOMMU is enabled.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NTymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: NAndrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

60368186

03 8月, 2016 2 次提交

IB/hfi1: Handle kzalloc failure in init_pervl_scs · 042b0159

由 Ira Weiny 提交于 7月 27, 2016

Checking the return value of the memory allocation call in
init_pervl_scs() was missed. Recently the kmalloc() was changed to
kzalloc() which identified the problem.

While fixing this issue 2 other bugs were noticed. First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated. Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.

Fixes: 35f6befc ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

042b0159

IB/hfi1: Fix to fully initialize send context area · 1b23f02c

由 Tymoteusz Kielan 提交于 7月 25, 2016

While handling buffer control MAD, partially initialized
dd->kernel_send_context area may cause potential dereference
of uninitialized pointers. Fix by using kzalloc_node()
instead of kmalloc_node().
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NAndrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: NTymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: NAndrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1b23f02c

18 6月, 2016 2 次提交

IB/hfi1: Increase packet egress timeout · c3c64a95

由 Jubin John 提交于 6月 09, 2016

The current value of 500us for the packet egress timeout is too small
which causes the host to declare failure on draining packets too early
and unnecessarily bounces the link. Increase this to 50ms taking into
account the switch packet discard timer default and the worst case
per-VL package drainage rate.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3c64a95

IB/hfi1: Fix credit return threshold adjustment · b4ba6633

由 Jubin John 提交于 6月 09, 2016

The credit return threshold adjustment on mtu change algorithm does not
take into account all the kernel send contexts that are assigned per VL.
Use the pio send context map to adjust the credit return thresholds for
all the allocated and assigned kernel send contexts based on the MTU
adjustment per VL.

The pio send context map can be changed dynamically based on the actual
number of operational vls which is set by the fabric manager. When this
happens update the credit return threshold values for all the remapped
kernel send contexts.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b4ba6633

27 5月, 2016 1 次提交

IB/hfi1: Fix pio map initialization · f1584865

由 Jubin John 提交于 5月 24, 2016

The pio map initialization function is off by 1 causing the last
kernel send context that is allocated to not get mapped into the
pio map which leads to the last kernel send context not being used
by any of the qps.

The send context reserved for VL15 is taken care of by setting the
scontext variable that is used as the index into the kernel send
context array to 1 and does not need to be accounted for in the
kernel send context counting loop as it is currently done.

Fix the kernel send context counting loop to account for all the
allocated send contexts and map all of them to the different VLs.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f1584865

26 5月, 2016 1 次提交

IB/hfi1: Move driver out of staging · f48ad614

由 Dennis Dalessandro 提交于 5月 19, 2016

The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f48ad614

29 4月, 2016 1 次提交

IB/hfi1: Reduce kernel context pio buffer allocation · 44306f15

由 Jianxin Xiong 提交于 4月 12, 2016

The pio buffers were pooled evenly among all kernel contexts and
user contexts. However, the demand from kernel contexts is much
lower than user contexts. This patch reduces the allocation for
kernel contexts and thus makes more credits available for PSM,
helping performance. This is especially useful on high core-count
systems where large numbers of contexts are used.

A new context type SC_VL15 is added to distinguish the context used
for VL15 from other kernel contexts. The reason is that VL15 needs
to support 2KB sized packet while other kernel contexts need only
support packets up to the size determined by "piothreshold", which
has a default value of 256.

The new allocation method allows triple buffering of largest pio
packets configured for these contexts. This is sufficient to maintain
verbs performance. The largest pio packet size is 2048B for VL15
and "piothreshold" for other kernel contexts. A cap is applied to
"piothreshold" to avoid excessive buffer allocation.

The special case that SDMA is disable is handled differently. In
that case, the original pooling allocation is used to better
support the much higher pio traffic.

Notice that if adaptive pio is disabled (piothreshold==0), the pio
buffer size doesn't matter for non-VL15 kernel send contexts when
SDMA is enabled because pio is not used at all on these contexts
and thus the new allocation is still valid. If SDMA is disabled then
pooling allocation is used as mentioned in previous paragraph.

Adjustment is also made to the calculation of the credit return
threshold for the kernel contexts. Instead of purely based on
the MTU size, a percentage based threshold is also considered and
the smaller one of the two is chosen. This is necessary to ensure
that with the reduced buffer allocation credits are returned in
time to avoid unnecessary stall in the send path.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMark Debbage <mark.debbage@intel.com>
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

44306f15

18 3月, 2016 2 次提交

IB/hfi1: Fix PIO wakeup timing hole · 60df2958

由 Mike Marciniszyn 提交于 3月 07, 2016

There is a timing hole if there had been greater than
PIO_WAIT_BATCH_SIZE waiters.  This code will dispatch the first
batch but leave the others in the queue.   If the restarted waiters
don't in turn wait on a buffer, there is a hang.

Fix by forcing a return when the QP queue is non-empty.
Reviewed-by: NVennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

60df2958

IB/hfi1: Move constant to the right in bitwise operations · bf640096

由 Harish Chegondi 提交于 3月 05, 2016

Implement changes recommended by the Coccinelle tool to move constant to
the right in bitwise operations

-bash-4.2$ make coccicheck MODE=report M=drivers/infiniband/hw/hfi1/

drivers/infiniband/hw/hfi1/pio.c:765:4-16: Move constant to right.
drivers/infiniband/hw/hfi1/rc.c:2503:19-29: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:9813:11-22: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:14468:29-40: Move constant to right.
Reviewed-by: NJubin John <jubin.john@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bf640096

11 3月, 2016 12 次提交

staging/rdma/hfi1: Fix memory leaks · 79d0c088

由 Jubin John 提交于 2月 26, 2016

Fix 3 memory leaks reported by the LeakCheck tool in the KEDR framework.

The following resources were allocated memory during their respective
initializations but not freed during cleanup:
1. SDMA map elements
2. PIO map elements
3. HW send context to SW index map

This patch fixes the memory leaks by freeing the allocated memory in the
cleanup path.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

79d0c088

staging/rdma/hfi1: Fix header · 05d6ac1d

由 Jubin John 提交于 2月 14, 2016

Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

05d6ac1d

staging/rdma/hfi1: Fix code alignment · 17fb4f29

由 Jubin John 提交于 2月 14, 2016

Fix code alignment to fix checkpatch check:
CHECK: Alignment should match open parenthesis
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

17fb4f29

staging/rdma/hfi1: Fix block comments · 4d114fdd

由 Jubin John 提交于 2月 14, 2016

Fix block comments with proper formatting to fix checkpatch warnings:
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4d114fdd

staging/rdma/hfi1: Use pointer instead of struct name · fcdd76df

由 Jubin John 提交于 2月 14, 2016

Use sizeof(*p) instead of sizeof(struct foo) to fix checkpatch check:
CHECK: Prefer alloc(sizeof(*p)...) over alloc(sizeof(struct foo)...)
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fcdd76df

staging/rdma/hfi1: Fix comparison to NULL · d125a6c6

由 Jubin John 提交于 2月 14, 2016

Convert pointer comparisons to NULL to !pointer
to fix checkpatch check:
CHECK: Comparison to NULL could be written "!pointer"
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d125a6c6

staging/rdma/hfi1: Remove space after cast · 50e5dcbe

由 Jubin John 提交于 2月 14, 2016

Remove the space after a cast to fix checkpatch check:
CHECK: No space is necessary after a cast
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50e5dcbe

staging/rdma/hfi1: Remove multiple blank lines · 74182acd

由 Jubin John 提交于 2月 14, 2016

Remove multiple blank lines to fix checkpatch check:
CHECK: Please don't use multiple blank lines
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

74182acd

staging/rdma/hfi1: Add spaces around binary operators · 8638b77f

由 Jubin John 提交于 2月 14, 2016

Add spaces around binary operators.

Fixes checkpatch check:
CHECK: spaces preferred around that 'x'
where x is a binary operator
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8638b77f

staging/rdma/hfi1: Add qp to send context mapping for PIO · 35f6befc

由 Jubin John 提交于 2月 14, 2016

PIO send context mapping is changed from per-VL to QPN based.

qp to send context mapping is done using a mapping infrastructure
similar to the current vl to sdma engine mapping.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

35f6befc

staging/rdma/hfi1: Adaptive PIO for short messages · 14553ca1

由 Mike Marciniszyn 提交于 2月 14, 2016

The change requires a new pio_busy field in the iowait structure to
track the number of outstanding pios.  The new counter together
with the sdma counter serve as the basis for a packet by packet decision
as to which egress mechanism to use.  Since packets given to different
egress mechanisms are not ordered, this scheme will preserve the order.

The iowait drain/wait mechanisms are extended for a pio case.  An
additional qp wait flag is added for the PIO drain wait case.

Currently the only pio wait is for buffers, so the no_bufs_available()
routine name is changed to pio_wait() and a third argument is passed
with one of the two pio wait flags to generalize the routine.  A module
parameter is added to hold a configurable threshold. For now, the
module parameter is zero.

A heuristic routine is added to return the func pointer of the proper
egress routine to use.

The heuristic is as follows:
- SMI always uses pio
- GSI,UD qps <= threshold use pio
- UD qps > threadhold use sdma
  o No coordination with sdma is required because order is not required
    and this qp pio count is not maintained for UD
- RC/UC ONLY packets <= threshold chose as follows:
  o If sdmas pending, use SDMA
  o Otherwise use pio and enable the pio tracking count at
    the time the pio buffer is allocated
- RC/UC ONLY packets > threshold use SDMA
  o If pio's are pending the pio_wait with the new wait flag is
    called to delay for pios to drain

The threshold is potentially reduced by the QP's mtu.

The sc_buffer_alloc() has two additional args (a callback, a void *)
which are exploited by the RC/UC cases to pass a new complete routine
and a qp *.

When the shadow ring completes the credit associated with a packet,
the new complete routine is called.  The verbs_pio_complete() will then
decrement the busy count and trigger any drain waiters in qp destroy
or reset.
Reviewed-by: NJubin John <jubin.john@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

14553ca1

staging/rdma/hfi1: Use rdmavt send flags and recv flags · 54d10c1e

由 Dennis Dalessandro 提交于 1月 19, 2016

Use the definitions of the s_flags and r_flags which are now in rdmavt.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

54d10c1e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功