提交 · 71dc371993625b4b1ae26214af74427765bfa3a2 · openeuler / Kernel

13 2月, 2018 3 次提交

i40e/i40evf: Clean up logic for adaptive ITR · 71dc3719

由 Alexander Duyck 提交于 12月 29, 2017

The logic for dynamic ITR update is confusing at best as there were odd
paths chosen for how to find the rings associated with a given queue based
on the vector index and other inconsistencies throughout the code.

This patch is an attempt to clean up the logic so that we can more easily
understand what is going on. Specifically if there is a Rx or Tx ring that
is enabled in dynamic mode on the q_vector it is allowed to override the
other side of the interrupt moderation. While it isn't correct all this
patch is doing is cleaning up the logic for now so that when we come
through and fix it we can more easily identify that this is wrong.

The other big change made here is that we replace references to:
	vsi->rx_rings[q_vector->v_idx]->itr_setting
with:
	q_vector->rx.ring->itr_setting

The general idea is we can avoid the long pointer chase since just
accessing q_vector->rx.ring is a single pointer access versus having to
chase down vsi->rx_rings, and then finding the pointer in the array, and
finally chasing down the itr_setting from there.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

71dc3719

i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx · 40588ca6

由 Alexander Duyck 提交于 12月 29, 2017

The rings are already split out into Tx and Rx rings so it doesn't make
sense to have any single ring store both a Tx and Rx itr_setting value.
Since that is the case drop the pair in favor of storing just a single ITR
value.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

40588ca6

i40e: fix typo in function description · 11a350c9

由 Alan Brady 提交于 12月 29, 2017

'bufer' should be 'buffer'
Signed-off-by: NAlan Brady <alan.brady@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

11a350c9

27 1月, 2018 1 次提交

i40e/i40evf: Record ITR register location in the q_vector · a3f9fb5e

由 Alexander Duyck 提交于 12月 29, 2017

The drivers for i40e and i40evf had a reg_idx value stored in the q_vector
that was going completely unused. I can only assume this was copied over
from ixgbe and nobody knew how to use it.

I'm going to make use of the value to avoid having to compute the vector
and thus the register index for multiple paths throughout the drivers.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a3f9fb5e

24 1月, 2018 1 次提交

i40e/i40evf: Detect and recover hung queue scenario · 07d44190

由 Sudheer Mogilappagari 提交于 12月 18, 2017

In VFs, there is a known issue which can cause writebacks
to not occur when interrupts are disabled and there are
less than 4 descriptors resulting in TX timeout. Timeout
can also occur due to lost interrupt.

The current implementation for detecting and recovering
from hung queues in the PF is problematic because it actually
actively encourages lost interrupts.  By triggering a SW
interrupt, interrupts are forced on.  If we are already in
napi_poll and an interrupt fires, napi_poll will not be
rescheduled and the interrupt is effectively lost; thereby
potentially *causing* hung queues.

This patch checks whether packets are being processed between
every watchdog cycle and determine potential hung queue and
fires triggers SW interrupt only for that particular queue.
Signed-off-by: NSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

07d44190

04 1月, 2018 1 次提交

i40e/i40evf: Account for frags split over multiple descriptors in check linearize · 248de22e

由 Alexander Duyck 提交于 12月 08, 2017

The original code for __i40e_chk_linearize didn't take into account the
fact that if a fragment is 16K in size or larger it has to be split over 2
descriptors and the smaller of those 2 descriptors will be on the trailing
edge of the transmit. As a result we can get into situations where we didn't
catch requests that could result in a Tx hang.

This patch takes care of that by subtracting the length of all but the
trailing edge of the stale fragment before we test for sum. By doing this
we can guarantee that we have all cases covered, including the case of a
fragment that spans multiple descriptors. We don't need to worry about
checking the inner portions of this since 12K is the maximum aligned DMA
size and that is larger than any MSS will ever be since the MTU limit for
jumbos is something on the order of 9K.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

248de22e

22 11月, 2017 1 次提交

i40evf: Use smp_rmb rather than read_barrier_depends · f72271e2

由 Brian King 提交于 11月 17, 2017

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40evf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f72271e2

01 11月, 2017 1 次提交

i40e/i40evf: Revert "i40e/i40evf: bump tail only in multiples of 8" · aa250f11

由 Alexander Duyck 提交于 10月 21, 2017

This reverts commit 11f29003.

I am reverting this as I am fairly certain this can result in a memory leak
when combined with the current page recycling scheme. Specifically we end
up attempting to allocate fewer buffers than we recycled and this results
in us rewinding the next to alloc pointer which leads to leaks when we
overwrite the rx_buffer_info when processing the next frame.

Fixes: 11f29003 ("i40e/i40evf: bump tail only in multiples of 8")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

aa250f11

10 10月, 2017 2 次提交

i40e/i40evf: bump tail only in multiples of 8 · 11f29003

由 Jacob Keller 提交于 9月 07, 2017

Hardware only fetches descriptors on cachelines of 8, essentially
ignoring the lower 3 bits of the tail register. Thus, it is pointless to
bump tail by an unaligned access as the hardware will ignore some of the
new descriptors we allocated. Thus, it's ideal if we can ensure tail
writes are always aligned to 8.

At first, it seems like we'd already do this, since we allocate
descriptors in batches which are a multiple of 8. Since we'd always
increment by a multiple of 8, it seems like the value should always be
aligned.

However, this ignores allocation failures. If we fail to allocate
a buffer, our tail register will become unaligned. Once it has become
unaligned it will essentially be stuck unaligned until a buffer
allocation happens to fail at the exact amount necessary to re-align it.

We can do better, by simply rounding down the number of buffers we're
about to allocate (cleaned_count) such that "next_to_clean
+ cleaned_count" is rounded to the nearest multiple of 8.

We do this by calculating how far off that value is and subtracting it
from the cleaned_count. This essentially defers allocation of buffers if
they're going to be ignored by hardware anyways, and re-aligns our
next_to_use and tail values after a failure to allocate a descriptor.

This calculation ensures that we always align the tail writes in a way
the hardware expects and don't unnecessarily allocate buffers which
won't be fetched immediately.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

11f29003

i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts · dbadbbe2

由 Jacob Keller 提交于 9月 07, 2017

In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.

According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.

This reverts commit 40d72a50 ("i40e/i40evf: don't lose interrupts")
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

dbadbbe2

30 9月, 2017 1 次提交

i40e/i40evf: rename bytes_per_int to bytes_per_usec · 2b634bb0

由 Jacob Keller 提交于 7月 14, 2017

This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.

Rename the variable for clarity so that future developers understand
what the value is actually calculating.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2b634bb0

28 8月, 2017 3 次提交

i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate · 742c9875

由 Jacob Keller 提交于 7月 14, 2017

The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

742c9875

i40e/i40evf: remove ULTRA latency mode · 0a2c7722

由 Jacob Keller 提交于 7月 14, 2017

Since commit c56625d5 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0a2c7722

i40e: invert logic for checking incorrect cpu vs irq affinity · 6d977729

由 Jacob Keller 提交于 7月 14, 2017

In commit 96db776a ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6d977729

26 7月, 2017 1 次提交

i40e/i40evf: remove mismatched type warnings · b85c94b6

由 Jesse Brandeburg 提交于 6月 20, 2017

Compiler reported several places where driver compared
signed and unsigned types.  Cast or change the types to remove
the warnings.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b85c94b6

06 6月, 2017 1 次提交

i40e/i40evf: proper update of the page_offset field · 2aae918c

由 Björn Töpel 提交于 5月 15, 2017

In f8b45b74 ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74 ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2aae918c

30 4月, 2017 2 次提交

i40e: use DECLARE_BITMAP for state fields · 0da36b97

由 Jacob Keller 提交于 4月 19, 2017

Instead of assuming our flags fit within an unsigned long, use
DECLARE_BITMAP which will ensure that we always allocate enough space.
Additionally, use __I40E_STATE_SIZE__ markers as the last element of the
enumeration so that the size of the BITMAP is compile-time assigned
rather than programmer-time assigned. This ensures that potential future
flag additions do not actually overrun the array. This is especially
important as 32bit systems would only have 32bit longs instead of 64bit
longs as we generally have assumed in the prior code.

This change also removes a dereference of the state fields throughout
the code, so it does have a bit of code churn. The conversions were
automated using sed replacements with an alternation

  s/&(vsi->back|vsi|pf)->state/\1->state/
  s/&adapter->vsi.state/adapter->vsi.state/

For debugfs, we modify the printing so that we can display chunks of the
state value on new lines. This ensures that we can print the entire set
of state values. Additionally, we now print them as 08lx to ensure that
they display nicely.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0da36b97

i40e: separate PF and VSI state flags · d19cb64b

由 Jacob Keller 提交于 4月 21, 2017

Avoid using the same named flags for both vsi->state and pf->state. This
makes code review easier, as it is more likely that future authors will
use the correct state field when checking bits. Previous commits already
found issues with at least one check, and possibly others may be
incorrect.

This reduces confusion as it is more clear what each flag represents,
and which flags are valid for which state field.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d19cb64b

20 4月, 2017 3 次提交

i40e/i40evf: Add tracepoints · ed0980c4

由 Scott Peterson 提交于 4月 13, 2017

This patch adds tracepoints to the i40e and i40evf drivers to which
BPF programs can be attached for feature testing and verification.
It's expected that an attached BPF program will identify and count or
log some interesting subset of traffic. The bcc-tools package is
helpful there for containing all the BPF arcana in a handy Python
wrapper. Though you can make these tracepoints log trace messages, the
messages themselves probably won't be very useful (other to verify the
tracepoint is being called while you're debugging your BPF program).

The idea here is that tracepoints have such low performance cost when
disabled that we can leave these in the upstream drivers. This may
eventually enable the instrumentation of unmodified customer systems
should the need arise to verify a NIC feature is working as expected.
In general this enables one set of feature verification tools to be
used on these drivers whether they're built with the kernel or
separately.

Users are advised against using these tracepoints for anything other
than a diagnostic tool. They have a performance impact when enabled,
and their exact placement and form may change as we see how well they
work in practice for the purposes above.

Change-ID: Id6014a7322c0e6d08068114dd20bd156f2f6435e
Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ed0980c4

i40e: Fix support for flow director programming status · 0e626ff7

由 Alexander Duyck 提交于 4月 10, 2017

This patch fixes an issue I introduced when I converted the code over to
using the length field to determine if a descriptor was done or not. It
turns out that we are also processing programming descriptors in the Rx
path and need to have these processed even though the length field will be
0 on these packets. What will happen with a programming descriptor is that
we will receive a descriptor that has the SPH bit set, and the header
length and packet length fields cleared.

To account for this we should be checking for the bit for split header
being set even though we aren't actually using header split. This bit is
set in the length field to indicate if a programming descriptor response is
contained in the descriptor. Since we don't support header split we don't
need to perform the extra checks of using a fixed value for the entire
length field.

In addition I am moving the function for checking if a filter is a
programming status filter into the i40e_txrx.c file since there is no
longer support for FCoE it doesn't make sense to keep this file in i40e.h.

Change-ID: I12c359c3dc70adb9d6b92b27324bb2c7f04c1a06
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0e626ff7

i40e/i40evf: Remove VF Rx csum offload for tunneled packets · 53240e99

由 alice michael 提交于 4月 06, 2017

Rx checksum offload for tunneled packets was never being negotiated or
requested by VF. This capability was assumed by default and enabled in
current hardware for VF. Going forward, this feature needs to be disabled
or advanced ptypes should be negotiated with PF in the future.

Change-ID: I9e54cfa8a90e03ab6956db4412f1e337ccd2c2e0
Signed-off-by: NPreethi Banala <preethi.banala@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

53240e99

08 4月, 2017 3 次提交

i40e/i40evf: Use build_skb to build frames · f8b45b74

由 Alexander Duyck 提交于 4月 05, 2017

This patch is meant to improve the performance of the Rx path.
Specifically by using build_skb we have several distinct advantages.

In the case of small frames we were previously using a copy-break approach.
This means that we were allocating a page fragment to use for skb->head,
and were having to copy the packet into that region. Both of those calls
are now avoided since we just build the skb around the data.

In the case of large frames the gains are much more significant.
Specifically we were having to allocate skb->head, and copy the headers as
before. However in addition we were having to parse the header using
eth_get_headlen which could be quite expensive. All of this is avoided by
building the frame around the data. I have seen gains as high as 30% when
using VXLAN for instance due to just header pulling overhead.

Finally with all this in place it also sets us up to start looking at
enabling XDP. Specifically we now have a path in which the data is in the
page and the frame is built around it. So if we parse it with XDP before
we call build_skb we can take care of any necessary processing there.

Change-ID: Id4bdd618e94473d41f892417e5d8019639e421e3
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f8b45b74

i40e/i40evf: Add support for padding start of frames · ca9ec088

由 Alexander Duyck 提交于 4月 05, 2017

This patch adds padding to the start of frames to make room for headroom
for us to eventually start using build_skb.  Right now we guarantee at
least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is
available.  For example on x86 the headroom should be 192 bytes.

On systems that have too large of a cache line size to support storing 1.5K
padding and shared info we default to using 3K buffers and reserve
everything that isn't used for skb_shared_info or the data buffer for
headroom.

Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ca9ec088

i40e/i40evf: Add support for using order 1 pages with a 3K buffer · 98efd694

由 Alexander Duyck 提交于 4月 05, 2017

There are situations where adding padding to the front and back of an Rx
buffer will require that we add additional padding. Specifically if
NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need
to use 2K buffers which leaves us with no room for the padding.

To preemptively address these cases I am adding support for 3K buffers to
the Rx path so that we can provide the additional padding needed in the
event of NET_IP_ALIGN being non-zero or a cache line being greater than 64.

Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

98efd694

29 3月, 2017 5 次提交

i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code · fa2343e9

由 Alexander Duyck 提交于 3月 14, 2017

This patch is meant to clean up the code in preparation for us adding
support for build_skb.  Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

fa2343e9

i40e/i40evf: Pull out code for cleaning up Rx buffers · a0cfc313

由 Alexander Duyck 提交于 3月 14, 2017

This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions. This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer. It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a0cfc313

i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer · 9a064128

由 Alexander Duyck 提交于 3月 14, 2017

This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer.  We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9a064128

i40e/i40evf: Use length to determine if descriptor is done · d57c0e08

由 Alexander Duyck 提交于 3月 14, 2017

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d57c0e08

i40evf: enforce descriptor write-back mechanism for VF · b1cb07db

由 Preethi Banala 提交于 3月 10, 2017

The current driver mode is to use a write-back mechanism for the head
register which indicates transmit completions. The VF driver needs to be
able to work on hardware that exclusively uses descriptor write-back, so
change the default driver mode of operation to descriptor write-back for
VF. In our analysis, performance wasn't significantly different with
either write-back method.

Change-ID: Ia92e4ec77c2df8dc4515c71d53746d57d77759af
Signed-off-by: NPreethi Banala <preethi.banala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b1cb07db

28 3月, 2017 3 次提交

i40e/i40evf: Clean-up process_skb_fields · a5b268e4

由 Alexander Duyck 提交于 2月 21, 2017

This is a minor clean-up to make the i40e/i40evf process_skb_fields
function look a little more like what we have in igb.  The Rx checksum
function called out a need for skb->protocol but I can't see where it
actually needs it.  I am assuming this is something that was likely
refactored out some time ago as the Rx checksum code has gone through a few
rewrites.

Change-ID: I0b4668a34d90b61b66ded7c7c26e19a3e2d06251
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a5b268e4

i40e/i40evf: Fix use after free in Rx cleanup path · 741b8b83

由 Alexander Duyck 提交于 2月 21, 2017

We need to reset skb back to NULL when we have freed it in the Rx cleanup
path. I found one spot where this wasn't occurring so this patch fixes it.

Change-ID: Iaca68934200732cd4a63eb0bd83b539c95f8c4dd
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

741b8b83

i40e/i40evf: Update code to better handle incrementing page count · 1793668c

由 Alexander Duyck 提交于 2月 21, 2017

Update the driver code so that we do bulk updates of the page reference
count instead of just incrementing it by one reference at a time.  The
advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.
In addition if we eventually move this over to using build_skb the gains
will be more noticeable.

I also found and fixed a store forwarding stall from where we were
assigning "*new_buff = *old_buff".  By breaking it up into individual
copies we can avoid this and as a result the performance is slightly
improved.

Change-ID: I1d3880dece4133eca3c32423b04a5467321ccc52
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1793668c

15 3月, 2017 1 次提交

i40e/i40evf: Add support for mapping pages with DMA attributes · 59605bc0

由 Alexander Duyck 提交于 1月 30, 2017

This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we
are able to see performance improvements on architectures that implement
either one due to the fact that page mapping and unmapping only has to
sync what is actually being used instead of the entire buffer. In addition
by enabling the weak ordering attribute enables a performance improvement
for architectures that can associate a memory ordering with a DMA buffer
such as Sparc.

Change-ID: If176824e8231c5b24b8a5d55b339a6026738fc75
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

59605bc0

19 2月, 2017 2 次提交

i40e: mark the value passed to csum_replace_by_diff as __wsum · b9c015d4

由 Jacob Keller 提交于 12月 12, 2016

Fix, or rather, avoid a sparse warning caused by the fact that
csum_replace_by_diff expects to receive a __wsum value. Since the
calculation appears to work, simply typecast the passed paylen value to
__wsum to avoid the warning.

This seems pretty fishy since __wsum was obviously annotated as
a separate type on purpose, so this throws the entire calculation into
question. Since it currently appears to behave as expected, the typecast
is probably safe.

Change-ID: I4fdc5cddd589abc16098176e8a61127e761488f4
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b9c015d4

i40e: Fix Adaptive ITR enabling · 3c234c47

由 Carolyn Wyborny 提交于 12月 12, 2016

This patch fixes a bug introduced with the addition of the per queue
ITR feature support in ethtool. With that addition, there were
functions added which converted the ITR settings to binary values.
The IS_ENABLED macros that run on those values check whether a bit
is set or not and with the value being binary, the bit check always
returned ITR disabled which prevents any updating of the ITR rate.
This patch fixes the problem by changing the functions to return the
current ITR value instead and renaming it to better reflect
its function. These functions now provide a value which will be
accurately asessed and update the ITR as intended.

Change-ID: I14f1d088d052e27f652aaa3113e186415ddea1fc
Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3c234c47

12 2月, 2017 3 次提交

i40e/i40evf: eliminate i40e_pull_tail() · 9b37c937

由 Scott Peterson 提交于 2月 09, 2017

Reorganize the i40e_pull_tail() logic, doing it in i40e_add_rx_frag()
where it's cheaper.  The igb driver does this the same way.

Also renames i40e_page_is_reserved() to reflect what it actually
tests.

Change-ID: Icd9cc507aae1fcdc02308b3a09034111b4c24071
Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9b37c937

i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring · e72e5659

由 Scott Peterson 提交于 2月 09, 2017

This patch reduces the size of struct i40e_rx_buffer by one pointer,
and makes the i40e driver a little more consistent with the igb driver
in terms of packets that span buffers.

We do this by moving the skb field from struct i40e_rx_buffer to
struct i40e_ring. We pass the skb we already have (or NULL if we
don't) to i40e_fetch_rx_buffer(), which skips the skb allocation if we
already have one for this packet.

Change-ID: I4ad48a531844494ba0c5d8e1a62209a057f661b0
Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e72e5659

i40e/i40evf: Limit DMA sync of RX buffers to actual packet size · 7987dcd7

由 Scott Peterson 提交于 2月 09, 2017

On packet RX, we perform a DMA sync for CPU before passing the
packet up.  Here we limit that sync to the actual length of the
incoming packet, rather than always syncing the entire buffer.

Change-ID: I626aaf6c37275a8ce9e81efcaa773f327b331487
Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

7987dcd7

03 2月, 2017 1 次提交

i40e: Quick refactor to start moving data off stack and into Tx buffer info · 52ea3e80

由 Alexander Duyck 提交于 11月 28, 2016

This patch does some quick work to pull some of the data off of the stack
and hopefully start storing it in the Tx buffer info section of the Tx
ring. Ideally we should be moving away from having to store much of
anything on the stack and can just maintain it all in the descriptor rings.

Change-ID: I4b4715ea1920e122502482b3f9e56a9a6cb1e9fe
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

52ea3e80

07 12月, 2016 1 次提交

i40e/i40evf: napi_poll must return the work done · 6beb84a7

由 Alexander Duyck 提交于 11月 08, 2016

Currently the function i40e_napi-poll() returns 0 when it clean completely
the Rx rings, but this foul budget accounting in core code.

Fix this by returning the actual work done, capped to budget - 1, since
the core doesn't allow to return the full budget when the driver modifies
the NAPI status

This is based on a similar change that was made for the ixgbe driver by
Paolo Abeni.

Change-ID: Ic3d93ad2fa2fc8ce3164bc461e69367da0f9173b
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6beb84a7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功