1. 13 2月, 2018 3 次提交
  2. 27 1月, 2018 1 次提交
  3. 24 1月, 2018 1 次提交
    • S
      i40e/i40evf: Detect and recover hung queue scenario · 07d44190
      Sudheer Mogilappagari 提交于
      In VFs, there is a known issue which can cause writebacks
      to not occur when interrupts are disabled and there are
      less than 4 descriptors resulting in TX timeout. Timeout
      can also occur due to lost interrupt.
      
      The current implementation for detecting and recovering
      from hung queues in the PF is problematic because it actually
      actively encourages lost interrupts.  By triggering a SW
      interrupt, interrupts are forced on.  If we are already in
      napi_poll and an interrupt fires, napi_poll will not be
      rescheduled and the interrupt is effectively lost; thereby
      potentially *causing* hung queues.
      
      This patch checks whether packets are being processed between
      every watchdog cycle and determine potential hung queue and
      fires triggers SW interrupt only for that particular queue.
      Signed-off-by: NSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      07d44190
  4. 04 1月, 2018 1 次提交
    • A
      i40e/i40evf: Account for frags split over multiple descriptors in check linearize · 248de22e
      Alexander Duyck 提交于
      The original code for __i40e_chk_linearize didn't take into account the
      fact that if a fragment is 16K in size or larger it has to be split over 2
      descriptors and the smaller of those 2 descriptors will be on the trailing
      edge of the transmit. As a result we can get into situations where we didn't
      catch requests that could result in a Tx hang.
      
      This patch takes care of that by subtracting the length of all but the
      trailing edge of the stale fragment before we test for sum. By doing this
      we can guarantee that we have all cases covered, including the case of a
      fragment that spans multiple descriptors. We don't need to worry about
      checking the inner portions of this since 12K is the maximum aligned DMA
      size and that is larger than any MSS will ever be since the MTU limit for
      jumbos is something on the order of 9K.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      248de22e
  5. 22 11月, 2017 1 次提交
  6. 01 11月, 2017 1 次提交
  7. 10 10月, 2017 2 次提交
    • J
      i40e/i40evf: bump tail only in multiples of 8 · 11f29003
      Jacob Keller 提交于
      Hardware only fetches descriptors on cachelines of 8, essentially
      ignoring the lower 3 bits of the tail register. Thus, it is pointless to
      bump tail by an unaligned access as the hardware will ignore some of the
      new descriptors we allocated. Thus, it's ideal if we can ensure tail
      writes are always aligned to 8.
      
      At first, it seems like we'd already do this, since we allocate
      descriptors in batches which are a multiple of 8. Since we'd always
      increment by a multiple of 8, it seems like the value should always be
      aligned.
      
      However, this ignores allocation failures. If we fail to allocate
      a buffer, our tail register will become unaligned. Once it has become
      unaligned it will essentially be stuck unaligned until a buffer
      allocation happens to fail at the exact amount necessary to re-align it.
      
      We can do better, by simply rounding down the number of buffers we're
      about to allocate (cleaned_count) such that "next_to_clean
      + cleaned_count" is rounded to the nearest multiple of 8.
      
      We do this by calculating how far off that value is and subtracting it
      from the cleaned_count. This essentially defers allocation of buffers if
      they're going to be ignored by hardware anyways, and re-aligns our
      next_to_use and tail values after a failure to allocate a descriptor.
      
      This calculation ensures that we always align the tail writes in a way
      the hardware expects and don't unnecessarily allocate buffers which
      won't be fetched immediately.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      11f29003
    • J
      i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts · dbadbbe2
      Jacob Keller 提交于
      In the past we changed driver behavior to not clear the PBA when
      re-enabling interrupts. This change was motivated by the flawed belief
      that clearing the PBA would cause a lost interrupt if a receive
      interrupt occurred while interrupts were disabled.
      
      According to empirical testing this isn't the case. Additionally, the
      data sheet specifically says that we should set the CLEARPBA bit when
      re-enabling interrupts in a polling setup.
      
      This reverts commit 40d72a50 ("i40e/i40evf: don't lose interrupts")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      dbadbbe2
  8. 30 9月, 2017 1 次提交
  9. 28 8月, 2017 3 次提交
    • J
      i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate · 742c9875
      Jacob Keller 提交于
      The dynamic ITR algorithm depends on a calculation of usecs which
      assumes that the interrupts have been firing constantly at the interrupt
      throttle rate. This is not guaranteed because we could have a low packet
      rate, or have been polling in software.
      
      We'll estimate whether this is the case by using jiffies to determine if
      we've been too long. If the time difference of jiffies is larger we are
      guaranteed to have an incorrect calculation. If the time difference of
      jiffies is smaller we might have been polling some but the difference
      shouldn't affect the calculation too much.
      
      This ensures that we don't get stuck in BULK latency during certain rare
      situations where we receive bursts of packets that force us into NAPI
      polling.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      742c9875
    • J
      i40e/i40evf: remove ULTRA latency mode · 0a2c7722
      Jacob Keller 提交于
      Since commit c56625d5 ("i40e/i40evf: change dynamic interrupt
      thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
      was added with a cryptic comment about how it was meant for adjusting Rx
      more aggressively when streaming small packets.
      
      This mode was attempting to calculate packets per second and then kick
      in when we have a huge number of small packets.
      
      Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
      intended for including single-thread UDP_STREAM workloads.
      
      This wasn't caught for a variety of reasons. First, the ip_defrag
      routines were improved somewhat which makes the UDP_STREAM test still
      reasonable at 10GbE, even when dropped down to 8k interrupts a second.
      Additionally, some other obvious workloads appear to work fine, such
      as TCP_STREAM.
      
      The number 40k doesn't make sense for a number of reasons. First, we
      absolutely can do more than 40k packets per second. Second, we calculate
      the value inline in an integer, which sometimes can overflow resulting
      in using incorrect values.
      
      If we fix this overflow it makes it even more likely that we'll enter
      ULTRA mode which is the opposite of what we want.
      
      The ULTRA mode was added originally as a way to reduce CPU utilization
      during a small packet workload where we weren't keeping up anyways. It
      should never have been kicking in during these other workloads.
      
      Given the issues outlined above, let's remove the ULTRA latency mode. If
      necessary, a better solution to the CPU utilization issue for small
      packet workloads will be added in a future patch.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0a2c7722
    • J
      i40e: invert logic for checking incorrect cpu vs irq affinity · 6d977729
      Jacob Keller 提交于
      In commit 96db776a ("i40e/vf: fix interrupt affinity bug")
      we added some code to force exit of polling in case we did
      not have the correct CPU. This is important since it was possible for
      the IRQ affinity to be changed while the CPU is pegged at 100%. This can
      result in the polling routine being stuck on the wrong CPU until
      traffic finally stops.
      
      Unfortunately, the implementation, "if the CPU is correct, exit as
      normal, otherwise, fall-through to the end-polling exit" is incredibly
      confusing to reason about. In this case, the normal flow looks like the
      exception, while the exception actually occurs far away from the if
      statement and comment.
      
      We recently discovered and fixed a bug in this code because we were
      incorrectly initializing the affinity mask.
      
      Re-write the code so that the exceptional case is handled at the check,
      rather than having the logic be spread through the regular exit flow.
      This does end up with minor code duplication, but the resulting code is
      much easier to reason about.
      
      The new logic is identical, but inverted. If we are running on a CPU not
      in our affinity mask, we'll exit polling. However, the code flow is much
      easier to understand.
      
      Note that we don't actually have to check for MSI-X, because in the MSI
      case we'll only have one q_vector, but its default affinity mask should
      be correct as it includes all CPUs when it's initialized. Further, we
      could at some point add code to setup the notifier for the non-MSI-X
      case and enable this workaround for that case too, if desired, though
      there isn't much gain since its unlikely to be the common case.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6d977729
  10. 26 7月, 2017 1 次提交
  11. 06 6月, 2017 1 次提交
  12. 30 4月, 2017 2 次提交
    • J
      i40e: use DECLARE_BITMAP for state fields · 0da36b97
      Jacob Keller 提交于
      Instead of assuming our flags fit within an unsigned long, use
      DECLARE_BITMAP which will ensure that we always allocate enough space.
      Additionally, use __I40E_STATE_SIZE__ markers as the last element of the
      enumeration so that the size of the BITMAP is compile-time assigned
      rather than programmer-time assigned. This ensures that potential future
      flag additions do not actually overrun the array. This is especially
      important as 32bit systems would only have 32bit longs instead of 64bit
      longs as we generally have assumed in the prior code.
      
      This change also removes a dereference of the state fields throughout
      the code, so it does have a bit of code churn. The conversions were
      automated using sed replacements with an alternation
      
        s/&(vsi->back|vsi|pf)->state/\1->state/
        s/&adapter->vsi.state/adapter->vsi.state/
      
      For debugfs, we modify the printing so that we can display chunks of the
      state value on new lines. This ensures that we can print the entire set
      of state values. Additionally, we now print them as 08lx to ensure that
      they display nicely.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0da36b97
    • J
      i40e: separate PF and VSI state flags · d19cb64b
      Jacob Keller 提交于
      Avoid using the same named flags for both vsi->state and pf->state. This
      makes code review easier, as it is more likely that future authors will
      use the correct state field when checking bits. Previous commits already
      found issues with at least one check, and possibly others may be
      incorrect.
      
      This reduces confusion as it is more clear what each flag represents,
      and which flags are valid for which state field.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d19cb64b
  13. 20 4月, 2017 3 次提交
    • S
      i40e/i40evf: Add tracepoints · ed0980c4
      Scott Peterson 提交于
      This patch adds tracepoints to the i40e and i40evf drivers to which
      BPF programs can be attached for feature testing and verification.
      It's expected that an attached BPF program will identify and count or
      log some interesting subset of traffic. The bcc-tools package is
      helpful there for containing all the BPF arcana in a handy Python
      wrapper. Though you can make these tracepoints log trace messages, the
      messages themselves probably won't be very useful (other to verify the
      tracepoint is being called while you're debugging your BPF program).
      
      The idea here is that tracepoints have such low performance cost when
      disabled that we can leave these in the upstream drivers. This may
      eventually enable the instrumentation of unmodified customer systems
      should the need arise to verify a NIC feature is working as expected.
      In general this enables one set of feature verification tools to be
      used on these drivers whether they're built with the kernel or
      separately.
      
      Users are advised against using these tracepoints for anything other
      than a diagnostic tool. They have a performance impact when enabled,
      and their exact placement and form may change as we see how well they
      work in practice for the purposes above.
      
      Change-ID: Id6014a7322c0e6d08068114dd20bd156f2f6435e
      Signed-off-by: NScott Peterson <scott.d.peterson@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ed0980c4
    • A
      i40e: Fix support for flow director programming status · 0e626ff7
      Alexander Duyck 提交于
      This patch fixes an issue I introduced when I converted the code over to
      using the length field to determine if a descriptor was done or not. It
      turns out that we are also processing programming descriptors in the Rx
      path and need to have these processed even though the length field will be
      0 on these packets.  What will happen with a programming descriptor is that
      we will receive a descriptor that has the SPH bit set, and the header
      length and packet length fields cleared.
      
      To account for this we should be checking for the bit for split header
      being set even though we aren't actually using header split. This bit is
      set in the length field to indicate if a programming descriptor response is
      contained in the descriptor. Since we don't support header split we don't
      need to perform the extra checks of using a fixed value for the entire
      length field.
      
      In addition I am moving the function for checking if a filter is a
      programming status filter into the i40e_txrx.c file since there is no
      longer support for FCoE it doesn't make sense to keep this file in i40e.h.
      
      Change-ID: I12c359c3dc70adb9d6b92b27324bb2c7f04c1a06
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0e626ff7
    • A
      i40e/i40evf: Remove VF Rx csum offload for tunneled packets · 53240e99
      alice michael 提交于
      Rx checksum offload for tunneled packets was never being negotiated or
      requested by VF. This capability was assumed by default and enabled in
      current hardware for VF. Going forward, this feature needs to be disabled
      or advanced ptypes should be negotiated with PF in the future.
      
      Change-ID: I9e54cfa8a90e03ab6956db4412f1e337ccd2c2e0
      Signed-off-by: NPreethi Banala <preethi.banala@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      53240e99
  14. 08 4月, 2017 3 次提交
    • A
      i40e/i40evf: Use build_skb to build frames · f8b45b74
      Alexander Duyck 提交于
      This patch is meant to improve the performance of the Rx path.
      Specifically by using build_skb we have several distinct advantages.
      
      In the case of small frames we were previously using a copy-break approach.
      This means that we were allocating a page fragment to use for skb->head,
      and were having to copy the packet into that region.  Both of those calls
      are now avoided since we just build the skb around the data.
      
      In the case of large frames the gains are much more significant.
      Specifically we were having to allocate skb->head, and copy the headers as
      before.  However in addition we were having to parse the header using
      eth_get_headlen which could be quite expensive.  All of this is avoided by
      building the frame around the data.  I have seen gains as high as 30% when
      using VXLAN for instance due to just header pulling overhead.
      
      Finally with all this in place it also sets us up to start looking at
      enabling XDP.  Specifically we now have a path in which the data is in the
      page and the frame is built around it.  So if we parse it with XDP before
      we call build_skb we can take care of any necessary processing there.
      
      Change-ID: Id4bdd618e94473d41f892417e5d8019639e421e3
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f8b45b74
    • A
      i40e/i40evf: Add support for padding start of frames · ca9ec088
      Alexander Duyck 提交于
      This patch adds padding to the start of frames to make room for headroom
      for us to eventually start using build_skb.  Right now we guarantee at
      least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is
      available.  For example on x86 the headroom should be 192 bytes.
      
      On systems that have too large of a cache line size to support storing 1.5K
      padding and shared info we default to using 3K buffers and reserve
      everything that isn't used for skb_shared_info or the data buffer for
      headroom.
      
      Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ca9ec088
    • A
      i40e/i40evf: Add support for using order 1 pages with a 3K buffer · 98efd694
      Alexander Duyck 提交于
      There are situations where adding padding to the front and back of an Rx
      buffer will require that we add additional padding.  Specifically if
      NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need
      to use 2K buffers which leaves us with no room for the padding.
      
      To preemptively address these cases I am adding support for 3K buffers to
      the Rx path so that we can provide the additional padding needed in the
      event of NET_IP_ALIGN being non-zero or a cache line being greater than 64.
      
      Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      98efd694
  15. 29 3月, 2017 5 次提交
  16. 28 3月, 2017 3 次提交
  17. 15 3月, 2017 1 次提交
  18. 19 2月, 2017 2 次提交
    • J
      i40e: mark the value passed to csum_replace_by_diff as __wsum · b9c015d4
      Jacob Keller 提交于
      Fix, or rather, avoid a sparse warning caused by the fact that
      csum_replace_by_diff expects to receive a __wsum value. Since the
      calculation appears to work, simply typecast the passed paylen value to
      __wsum to avoid the warning.
      
      This seems pretty fishy since __wsum was obviously annotated as
      a separate type on purpose, so this throws the entire calculation into
      question. Since it currently appears to behave as expected, the typecast
      is probably safe.
      
      Change-ID: I4fdc5cddd589abc16098176e8a61127e761488f4
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b9c015d4
    • C
      i40e: Fix Adaptive ITR enabling · 3c234c47
      Carolyn Wyborny 提交于
      This patch fixes a bug introduced with the addition of the per queue
      ITR feature support in ethtool.  With that addition, there were
      functions added which converted the ITR settings to binary values.
      The IS_ENABLED macros that run on those values check whether a bit
      is set or not and with the value being binary, the bit check always
      returned ITR disabled which prevents any updating of the ITR rate.
      This patch fixes the problem by changing the functions to return the
      current ITR value instead and renaming it to better reflect
      its function.  These functions now provide a value which will be
      accurately asessed and update the ITR as intended.
      
      Change-ID: I14f1d088d052e27f652aaa3113e186415ddea1fc
      Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3c234c47
  19. 12 2月, 2017 3 次提交
  20. 03 2月, 2017 1 次提交
  21. 07 12月, 2016 1 次提交