提交 · 0a9a17e3bb4564caf4bfe2a6783ae1287667d188 · openeuler / Kernel

01 11月, 2017 1 次提交

i40e/i40evf: Revert "i40e/i40evf: bump tail only in multiples of 8" · aa250f11

由 Alexander Duyck 提交于 10月 21, 2017

This reverts commit 11f29003.

I am reverting this as I am fairly certain this can result in a memory leak
when combined with the current page recycling scheme. Specifically we end
up attempting to allocate fewer buffers than we recycled and this results
in us rewinding the next to alloc pointer which leads to leaks when we
overwrite the rx_buffer_info when processing the next frame.

Fixes: 11f29003 ("i40e/i40evf: bump tail only in multiples of 8")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

aa250f11

26 10月, 2017 2 次提交

i40e: Add programming descriptors to cleaned_count · 62b4c669

由 Alexander Duyck 提交于 10月 21, 2017

This patch updates the i40e driver to include programming descriptors in
the cleaned_count. Without this change it becomes possible for us to leak
memory as we don't trigger a large enough allocation when the time comes to
allocate new buffers and we end up overwriting a number of rx_buffers equal
to the number of programming descriptors we encountered.

Fixes: 0e626ff7 ("i40e: Fix support for flow director programming status")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAnders K. Pedersen <akp@cohaesio.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

62b4c669

i40e: Fix incorrect use of tx_itr_setting when checking for Rx ITR setup · 10781348

由 Alexander Duyck 提交于 10月 20, 2017

It looks like there was either a copy/paste error or just a typo that
resulted in the Tx ITR setting being used to determine if we were using
adaptive Rx interrupt moderation or not.

This patch fixes the typo.

Fixes: 65e87c03 ("i40evf: support queue-specific settings for interrupt moderation")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

10781348

10 10月, 2017 3 次提交

i40e: Fix memory leak related filter programming status · 2b9478ff

由 Alexander Duyck 提交于 10月 04, 2017

It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.

This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.

Fixes: 0e626ff7 ("i40e: Fix support for flow director programming status")
Reported-by: NAnders K. Pedersen <akp@cohaesio.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAnders K Pedersen <akp@cohaesio.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2b9478ff

i40e/i40evf: bump tail only in multiples of 8 · 11f29003

由 Jacob Keller 提交于 9月 07, 2017

Hardware only fetches descriptors on cachelines of 8, essentially
ignoring the lower 3 bits of the tail register. Thus, it is pointless to
bump tail by an unaligned access as the hardware will ignore some of the
new descriptors we allocated. Thus, it's ideal if we can ensure tail
writes are always aligned to 8.

At first, it seems like we'd already do this, since we allocate
descriptors in batches which are a multiple of 8. Since we'd always
increment by a multiple of 8, it seems like the value should always be
aligned.

However, this ignores allocation failures. If we fail to allocate
a buffer, our tail register will become unaligned. Once it has become
unaligned it will essentially be stuck unaligned until a buffer
allocation happens to fail at the exact amount necessary to re-align it.

We can do better, by simply rounding down the number of buffers we're
about to allocate (cleaned_count) such that "next_to_clean
+ cleaned_count" is rounded to the nearest multiple of 8.

We do this by calculating how far off that value is and subtracting it
from the cleaned_count. This essentially defers allocation of buffers if
they're going to be ignored by hardware anyways, and re-aligns our
next_to_use and tail values after a failure to allocate a descriptor.

This calculation ensures that we always align the tail writes in a way
the hardware expects and don't unnecessarily allocate buffers which
won't be fetched immediately.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

11f29003

i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts · dbadbbe2

由 Jacob Keller 提交于 9月 07, 2017

In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.

According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.

This reverts commit 40d72a50 ("i40e/i40evf: don't lose interrupts")
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

dbadbbe2

06 10月, 2017 1 次提交

i40e: ignore skb->xmit_more when deciding to set RS bit · a5340d93

由 Jacob Keller 提交于 8月 29, 2017

Since commit 6a7fded7 ("i40e: Fix RS bit update in Tx path and
disable force WB workaround") we've tried to "optimize" setting the
RS bit based around skb->xmit_more. This same logic was refactored
in commit 1dc8b538 ("i40e: Reorder logic for coalescing RS bits"),
but ultimately was not functionally changed.

Using skb->xmit_more in this way is incorrect, because in certain
circumstances we may see a large number of skbs in sequence with
xmit_more set. This leads to a performance loss as the hardware does not
writeback anything for those packets, which delays the time it takes for
us to respond to the stack transmit requests. This significantly impacts
UDP performance, especially when layered with multiple devices, such as
bonding, VLANs, and vnet setups.

This was not noticed until now because it is difficult to create a setup
which reproduces the issue. It was discovered in a UDP_STREAM test in
a VM, connected using a vnet device to a bridge, which is connected to
a bonded pair of X710 ports in active-backup mode with a VLAN. These
layered devices seem to compound the number of skbs transmitted at once
by the qdisc. Additionally, the problem can be masked by reducing the
ITR value.

Since the original commit does not provide strong justification for this
RS bit "optimization", revert to the previous behavior of setting the RS
bit every 4th packet.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a5340d93

30 9月, 2017 1 次提交

i40e/i40evf: rename bytes_per_int to bytes_per_usec · 2b634bb0

由 Jacob Keller 提交于 7月 14, 2017

This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.

Rename the variable for clarity so that future developers understand
what the value is actually calculating.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2b634bb0

27 9月, 2017 1 次提交

bpf: add meta pointer for direct access · de8f3a83

由 Daniel Borkmann 提交于 9月 25, 2017

This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8f3a83

28 8月, 2017 4 次提交

i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate · 742c9875

由 Jacob Keller 提交于 7月 14, 2017

The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

742c9875

i40e/i40evf: remove ULTRA latency mode · 0a2c7722

由 Jacob Keller 提交于 7月 14, 2017

Since commit c56625d5 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0a2c7722

i40e: invert logic for checking incorrect cpu vs irq affinity · 6d977729

由 Jacob Keller 提交于 7月 14, 2017

In commit 96db776a ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6d977729

i40e: move enabling icr0 into i40e_update_enable_itr · 9254c0e3

由 Jacob Keller 提交于 7月 14, 2017

If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9254c0e3

02 8月, 2017 1 次提交

i40e: Initialize 64-bit statistics TX ring seqcount · 7d6d0677

由 Florian Fainelli 提交于 8月 01, 2017

On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 980e9b11 ("i40e: Add support for 64 bit netstats")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d6d0677

26 7月, 2017 2 次提交

i40e/i40evf: remove mismatched type warnings · b85c94b6

由 Jesse Brandeburg 提交于 6月 20, 2017

Compiler reported several places where driver compared
signed and unsigned types.  Cast or change the types to remove
the warnings.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b85c94b6

i40e/i40evf: make IPv6 ATR code clearer · 601a2e7a

由 Jesse Brandeburg 提交于 6月 20, 2017

This just reorders some local vars and makes the code flow
clearer.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

601a2e7a

21 6月, 2017 2 次提交

i40e: add support for XDP_TX action · 74608d17

由 Björn Töpel 提交于 5月 24, 2017

This patch adds proper XDP_TX action support. For each Tx ring, an
additional XDP Tx ring is allocated and setup. This version does the
DMA mapping in the fast-path, which will penalize performance for
IOMMU enabled systems. Further, debugfs support is not wired up for
the XDP Tx rings.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

74608d17

i40e: add XDP support for pass and drop actions · 0c8493d9

由 Björn Töpel 提交于 5月 24, 2017

This commit adds basic XDP support for i40e derived NICs. All XDP
actions will end up in XDP_DROP.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0c8493d9

13 6月, 2017 1 次提交

i40e: fix handling of HW ATR eviction · 6964e53f

由 Jacob Keller 提交于 6月 12, 2017

A recent commit to refactor the driver and remove the hw_disabled_flags
field accidentally introduced two regressions. First, we overwrote
pf->flags which removed various key flags including the MSI-X settings.

Additionally, it was intended that we have now two flags,
HW_ATR_EVICT_CAPABLE and HW_ATR_EVICT_ENABLED, but this was not done,
and we accidentally were mis-using HW_ATR_EVICT_CAPABLE everywhere.

This patch adds the missing piece, HW_ATR_EVICT_ENABLED, and safely
updates pf->flags instead of overwriting it.

Without this patch we will have many problems including disabling MSI-X
support, and we'll attempt to use HW ATR eviction on devices which do
not support it.

Fixes: 47994c11 ("i40e: remove hw_disabled_flags in favor of using separate flag bits", 2017-04-19)
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6964e53f

06 6月, 2017 1 次提交

i40e/i40evf: proper update of the page_offset field · 2aae918c

由 Björn Töpel 提交于 5月 15, 2017

In f8b45b74 ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74 ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2aae918c

31 5月, 2017 3 次提交

i40e: check for Tx timestamp timeouts during watchdog · 0bc0706b

由 Jacob Keller 提交于 5月 03, 2017

The i40e driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an i40e_ptp_tx_hang() function similar to the already existing
i40e_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0bc0706b

i40e: add statistic indicating number of skipped Tx timestamps · 2955faca

由 Jacob Keller 提交于 5月 03, 2017

The i40e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2955faca

i40e: avoid permanent lock of *_PTP_TX_IN_PROGRESS · 69077577

由 Jacob Keller 提交于 5月 03, 2017

The i40e driver uses a bit lock to indicate when a Tx timestamp is in
progress to avoid attempting to timestamp multiple packets at once. This
is required because hardware only has registers to handle one request at
a time.

There is a corner case where we failed to cleanup the bit lock after
a failed transmit. This can potentially result in a state bit being
locked forever.

Add some cleanup code to i40e_xmit_frame_ring to check and make sure we
cleanup incase of these failures. We also modify i40e_tx_map to return
an error code indication DMA failure.
Reported-by: NReported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

69077577

30 4月, 2017 3 次提交

i40e: remove hw_disabled_flags in favor of using separate flag bits · 47994c11

由 Jacob Keller 提交于 4月 19, 2017

The hw_disabled_flags field was added as a way of signifying that
a feature was automatically or temporarily disabled. However, we
actually only use this for FDir features. Replace its use with new
_AUTO_DISABLED flags instead. This is more readable, because you aren't
setting an *_ENABLED flag to *disable* the feature.

Additionally, clean up a few areas where we used these bits. First, we
don't really need to set the auto-disable flag for ATR if we're fully
disabling the feature via ethtool.

Second, we should always clear the auto-disable bits in case they somehow
got set when the feature was disabled. However, avoid displaying
a message that we've re-enabled the feature.

Third, we shouldn't be re-enabling ATR in the SB ntuple add flow,
because it might have been disabled due to space constraints. Instead,
we should just wait for the fdir_check_and_reenable to be called by the
watchdog.

Overall, this change allows us to simplify some code by removing an
extra field we didn't need, and the result should make it more clear as
to what we're actually doing with these flags.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

47994c11

i40e: use DECLARE_BITMAP for state fields · 0da36b97

由 Jacob Keller 提交于 4月 19, 2017

Instead of assuming our flags fit within an unsigned long, use
DECLARE_BITMAP which will ensure that we always allocate enough space.
Additionally, use __I40E_STATE_SIZE__ markers as the last element of the
enumeration so that the size of the BITMAP is compile-time assigned
rather than programmer-time assigned. This ensures that potential future
flag additions do not actually overrun the array. This is especially
important as 32bit systems would only have 32bit longs instead of 64bit
longs as we generally have assumed in the prior code.

This change also removes a dereference of the state fields throughout
the code, so it does have a bit of code churn. The conversions were
automated using sed replacements with an alternation

  s/&(vsi->back|vsi|pf)->state/\1->state/
  s/&adapter->vsi.state/adapter->vsi.state/

For debugfs, we modify the printing so that we can display chunks of the
state value on new lines. This ensures that we can print the entire set
of state values. Additionally, we now print them as 08lx to ensure that
they display nicely.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0da36b97

i40e: separate PF and VSI state flags · d19cb64b

由 Jacob Keller 提交于 4月 21, 2017

Avoid using the same named flags for both vsi->state and pf->state. This
makes code review easier, as it is more likely that future authors will
use the correct state field when checking bits. Previous commits already
found issues with at least one check, and possibly others may be
incorrect.

This reduces confusion as it is more clear what each flag represents,
and which flags are valid for which state field.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d19cb64b

20 4月, 2017 2 次提交

i40e/i40evf: Add tracepoints · ed0980c4

由 Scott Peterson 提交于 4月 13, 2017

This patch adds tracepoints to the i40e and i40evf drivers to which
BPF programs can be attached for feature testing and verification.
It's expected that an attached BPF program will identify and count or
log some interesting subset of traffic. The bcc-tools package is
helpful there for containing all the BPF arcana in a handy Python
wrapper. Though you can make these tracepoints log trace messages, the
messages themselves probably won't be very useful (other to verify the
tracepoint is being called while you're debugging your BPF program).

The idea here is that tracepoints have such low performance cost when
disabled that we can leave these in the upstream drivers. This may
eventually enable the instrumentation of unmodified customer systems
should the need arise to verify a NIC feature is working as expected.
In general this enables one set of feature verification tools to be
used on these drivers whether they're built with the kernel or
separately.

Users are advised against using these tracepoints for anything other
than a diagnostic tool. They have a performance impact when enabled,
and their exact placement and form may change as we see how well they
work in practice for the purposes above.

Change-ID: Id6014a7322c0e6d08068114dd20bd156f2f6435e
Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ed0980c4

i40e: Fix support for flow director programming status · 0e626ff7

由 Alexander Duyck 提交于 4月 10, 2017

This patch fixes an issue I introduced when I converted the code over to
using the length field to determine if a descriptor was done or not. It
turns out that we are also processing programming descriptors in the Rx
path and need to have these processed even though the length field will be
0 on these packets. What will happen with a programming descriptor is that
we will receive a descriptor that has the SPH bit set, and the header
length and packet length fields cleared.

To account for this we should be checking for the bit for split header
being set even though we aren't actually using header split. This bit is
set in the length field to indicate if a programming descriptor response is
contained in the descriptor. Since we don't support header split we don't
need to perform the extra checks of using a fixed value for the entire
length field.

In addition I am moving the function for checking if a filter is a
programming status filter into the i40e_txrx.c file since there is no
longer support for FCoE it doesn't make sense to keep this file in i40e.h.

Change-ID: I12c359c3dc70adb9d6b92b27324bb2c7f04c1a06
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0e626ff7

08 4月, 2017 6 次提交

i40e/i40evf: Use build_skb to build frames · f8b45b74

由 Alexander Duyck 提交于 4月 05, 2017

This patch is meant to improve the performance of the Rx path.
Specifically by using build_skb we have several distinct advantages.

In the case of small frames we were previously using a copy-break approach.
This means that we were allocating a page fragment to use for skb->head,
and were having to copy the packet into that region. Both of those calls
are now avoided since we just build the skb around the data.

In the case of large frames the gains are much more significant.
Specifically we were having to allocate skb->head, and copy the headers as
before. However in addition we were having to parse the header using
eth_get_headlen which could be quite expensive. All of this is avoided by
building the frame around the data. I have seen gains as high as 30% when
using VXLAN for instance due to just header pulling overhead.

Finally with all this in place it also sets us up to start looking at
enabling XDP. Specifically we now have a path in which the data is in the
page and the frame is built around it. So if we parse it with XDP before
we call build_skb we can take care of any necessary processing there.

Change-ID: Id4bdd618e94473d41f892417e5d8019639e421e3
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f8b45b74

i40e/i40evf: Add support for padding start of frames · ca9ec088

由 Alexander Duyck 提交于 4月 05, 2017

This patch adds padding to the start of frames to make room for headroom
for us to eventually start using build_skb.  Right now we guarantee at
least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is
available.  For example on x86 the headroom should be 192 bytes.

On systems that have too large of a cache line size to support storing 1.5K
padding and shared info we default to using 3K buffers and reserve
everything that isn't used for skb_shared_info or the data buffer for
headroom.

Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ca9ec088

i40e/i40evf: Add support for using order 1 pages with a 3K buffer · 98efd694

由 Alexander Duyck 提交于 4月 05, 2017

There are situations where adding padding to the front and back of an Rx
buffer will require that we add additional padding. Specifically if
NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need
to use 2K buffers which leaves us with no room for the padding.

To preemptively address these cases I am adding support for 3K buffers to
the Rx path so that we can provide the additional padding needed in the
event of NET_IP_ALIGN being non-zero or a cache line being greater than 64.

Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

98efd694

i40e: Simplify i40e_detect_recover_hung_queue logic · 17daabb5

由 Alan Brady 提交于 4月 05, 2017

This patch greatly reduces the unneeded complexity in the
i40e_detect_recover_hung_queue code path.  The previous implementation
set a 'hung bit' which would then get cleared while polling.  If the
detection routine was called a second time with the bit already set, we
would issue a software interrupt.  This patch makes it such that if
interrupts are disabled and we have pending TX descriptors, we trigger a
software interrupt since in, the worst case, queues are already clean
and we have an extra interrupt.

Additionally this patch removes the workaround for lost interrupts as
calling napi_reschedule in this context can cause software interrupts to
fire on the wrong CPU.

Change-ID: Iae108582a3ceb6229ed1d22e4ed6e69cf97aad8d
Signed-off-by: NAlan Brady <alan.brady@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

17daabb5

i40e: Swap use of pf->flags and pf->hw_disabled_flags for ATR Eviction · e8c5f723

由 Alexander Duyck 提交于 4月 05, 2017

This is a minor cleanup so that we are always updating pf->flags when we
make a change to the private flags instead of updating a mix of either
pf->flags and/or pf->hw_disabled_flags.

In addition I went through and cleaned out all the spots where we were
using the X722 define in regards to this flag.

Lastly since we changed the logic I went through and flushed out any
redundancy and cleaned up the handling of the flags in the Tx path.

Change-ID: I79ff95a7272bb2533251ff11ef91e89ccb80b610
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e8c5f723

i40e: update error message when trying to add invalid filters · a346fb83

由 Jacob Keller 提交于 4月 05, 2017

Re-word the error message displayed when adding a filter with an
invalid flow type. Additionally, report a distinct error message when
the IPv4 protocol is at fault.

Change-ID: Iba3d85b87f8d383c97c8bdd180df34a6adf3ee67
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a346fb83

29 3月, 2017 4 次提交

i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code · fa2343e9

由 Alexander Duyck 提交于 3月 14, 2017

This patch is meant to clean up the code in preparation for us adding
support for build_skb.  Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

fa2343e9

i40e/i40evf: Pull out code for cleaning up Rx buffers · a0cfc313

由 Alexander Duyck 提交于 3月 14, 2017

This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions. This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer. It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a0cfc313

i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer · 9a064128

由 Alexander Duyck 提交于 3月 14, 2017

This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer.  We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9a064128

i40e/i40evf: Use length to determine if descriptor is done · d57c0e08

由 Alexander Duyck 提交于 3月 14, 2017

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d57c0e08

28 3月, 2017 2 次提交

i40e: Drop FCoE code from core driver files · 9eed69a9

由 Alexander Duyck 提交于 2月 21, 2017

Looking over the code for FCoE it looks like the Rx path has been broken at
least since the last major Rx refactor almost a year ago.  It seems like
FCoE isn't supported for any of the Fortville/Fortpark hardware so there
isn't much point in carrying the code around, especially if it is broken
and untested.

Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9eed69a9

i40e/i40evf: Clean-up process_skb_fields · a5b268e4

由 Alexander Duyck 提交于 2月 21, 2017

This is a minor clean-up to make the i40e/i40evf process_skb_fields
function look a little more like what we have in igb.  The Rx checksum
function called out a need for skb->protocol but I can't see where it
actually needs it.  I am assuming this is something that was likely
refactored out some time ago as the Rx checksum code has gone through a few
rewrites.

Change-ID: I0b4668a34d90b61b66ded7c7c26e19a3e2d06251
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a5b268e4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功