1. 07 12月, 2016 1 次提交
  2. 01 11月, 2016 4 次提交
    • A
      i40e: Reorder logic for coalescing RS bits · 1dc8b538
      Alexander Duyck 提交于
      This patch reorders the logic at the end of i40e_tx_map to address the
      fact that the logic was rather convoluted and much larger than it needed
      to be.
      
      In order to try and coalesce the code paths I have updated some of the
      comments and repurposed some of the variables in order to reduce
      unnecessary overhead.
      
      This patch does the following:
      1.  Quit tracking skb->xmit_more with a flag, just max out packet_stride
      2.  Drop tail_bump and do_rs and instead just use desc_count and td_cmd
      3.  Pull comments from ixgbe that make need for wmb() more explicit.
      
      Change-ID: Ic7da85ec75043c634e87fef958109789bcc6317c
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1dc8b538
    • A
      i40e: Add common function for finding VSI by type · 4b816446
      Alexander Duyck 提交于
      This patch adds a common method for finding a VSI by type.  The main
      motivation for doing this is that the Flow Director path actually had two
      ways of handling this, one stopped on first match and one did not.  This
      patch makes it so that all callers of this function will get the same
      approach for finding a VSI.
      
      Change-ID: Ibf25de8acd8466582520694424aa87da66965fbd
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NBimmy Pujari <bimmy.pujari@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4b816446
    • J
      i40e: replace PTP Rx timestamp hang logic · 12490501
      Jacob Keller 提交于
      The current Rx timestamp hang logic is not very robust because it does
      not notice a register is hung until all four timestamps have been
      latched and we wait a full 5 seconds. Replace this logic with a newer Rx
      hang detection based on storing the jiffies when we first notice
      a receive timestamp event. We store each register's time separately,
      along with a flag indicating if it is currently latched. Upon first
      transitioning to latch, we will update the latch_events[i] jiffies
      value. This indicates the time we first noticed this event. The watchdog
      routine will simply check that the either the flag has been cleared, or
      we have passed at least one second. In this case, it is able to clear
      the Rx timestamp register under the assumption that it was for a dropped
      frame. The benefit if this strategy is that we should be able to
      detect and clear out stalled RXTIME_H registers before we exhaust the
      supply of 4, and avoid complete stall of Rx timestamp events.
      
      Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      12490501
    • J
      i40e: correct check for reading TSYNINDX from the receive descriptor · 144ed176
      Jacob Keller 提交于
      When hardware has taken a timestamp for a received packet, it indicates
      which RXTIME register the timestamp was placed in by some bits in the
      receive descriptor. It uses 3 bits, one to indicate if the descriptor
      index is valid (ie: there was a timestamp) and 2 bits to indicate which
      of the 4 registers to read. However, the driver currently does not check
      the TSYNVALID bit and only checks the index. It assumes a zero index
      means no timestamp, and a non zero index means a timestamp occurred.
      While this appears to be true, it prevents ever reading a timestamp in
      RXTIME[0], and causes the first timestamp the device captures to be
      ignored.
      
      Fix this by using the TSYNVALID bit correctly as the true indicator of
      whether the packet has an associated timestamp.
      
      Also rename the variable rsyn to tsyn as this is more descriptive and
      matches the register names.
      
      Change-ID: I4437e8f3a3df2c2ddb458b0fb61420f3dafc4c12
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      144ed176
  3. 29 10月, 2016 4 次提交
    • A
      i40e: Drop redundant Rx descriptor processing code · 99dad8b3
      Alexander Duyck 提交于
      This patch cleans up several pieces of redundant code in the Rx clean-up
      paths.
      
      The first bit is that hdr_addr and the status_err_len portions of the Rx
      descriptor represent the same value.  As such there is no point in setting
      them to 0 before setting them to 0.  I'm dropping the second spot where we
      are updating the value to 0 so that we only have 1 write for this value
      instead of 2.
      
      The second piece is the checking for the DD bit in the packet.  We only
      need to check for a non-zero value for the status_err_len because if the
      device is done with the descriptor it will have written something back and
      the DD is just one piece of it.  In addition I have moved the reading of
      the Rx descriptor bits related to rx_ptype down so that they are actually
      below the dma_rmb() call so that we are guaranteed that we don't have any
      funky 64b on 32b calls causing any ordering issues.
      
      Change-ID: I256e44a025d3c64a7224aaaec37c852bfcb1871b
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      99dad8b3
    • A
      i40e/i40evf: fix interrupt affinity bug · 96db776a
      Alan Brady 提交于
      There exists a bug in which a 'perfect storm' can occur and cause
      interrupts to fail to be correctly affinitized. This causes unexpected
      behavior and has a substantial impact on performance when it happens.
      
      The bug occurs if there is heavy traffic, any number of CPUs that have
      an i40e interrupt are pegged at 100%, and the interrupt afffinity for
      those CPUs is changed.  Instead of moving to the new CPU, the interrupt
      continues to be polled while there is heavy traffic.
      
      The bug is most readily realized as the driver is first brought up and
      all interrupts start on CPU0. If there is heavy traffic and the
      interrupt starts polling before the interrupt is affinitized, the
      interrupt will be stuck on CPU0 until traffic stops. The bug, however,
      can also be wrought out more simply by affinitizing all the interrupts
      to a single CPU and then attempting to move any of those interrupts off
      while there is heavy traffic.
      
      This patch fixes the bug by registering for update notifications from
      the kernel when the interrupt affinity changes. When that fires, we
      cache the intended affinity mask. Then, while polling, if the cpu is
      pegged at 100% and we failed to clean the rings, we check to make sure
      we have the correct affinity and stop polling if we're firing on the
      wrong CPU.  When the kernel successfully moves the interrupt, it will
      start polling on the correct CPU. The performance impact is minimal
      since the only time this section gets executed is when performance is
      already compromised by the CPU.
      
      Change-ID: I4410a880159b9dba1f8297aa72bef36dca34e830
      Signed-off-by: NAlan Brady <alan.brady@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      96db776a
    • A
      i40e: Drop code for unsupported flow types · e1da71ca
      Alexander Duyck 提交于
      We cannot currently support SCTP in the hardware, and IPV4_FLOW is not used
      anywhere by the software so we can go through and drop the functionality
      related to these two flow types.
      
      In addition we cannot support masking based on the protocol value so if the
      user is expecting a value other than TCP or UDP we should simply return an
      error rather then trying to allocate a filter for a rule that will only
      partially match what the user requested.
      
      Change-ID: I10d52bb97d8104d76255fe244551814ff9531a63
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e1da71ca
    • A
      i40e: Rewrite Flow Director busy wait loop · ed245406
      Alexander Duyck 提交于
      We can reorder the busy wait loop at the start of the Flow Director
      transmit function to reduce the overall code size while still retaining the
      same functionality.  As such I am taking advantage of the opportunity to do
      so.
      
      Change-ID: I34c403ca001953c6ac9816e65d5305e73d869026
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ed245406
  4. 25 9月, 2016 6 次提交
  5. 23 9月, 2016 1 次提交
  6. 20 8月, 2016 1 次提交
  7. 22 7月, 2016 1 次提交
  8. 15 7月, 2016 1 次提交
    • A
      i40e/i40evf: Fix i40e_rx_checksum · 858296c8
      Alexander Duyck 提交于
      There are a couple of issues I found in i40e_rx_checksum while doing some
      recent testing.  As a result I have found the Rx checksum logic is pretty
      much broken and returning that the checksum is valid for tunnels in cases
      where it is not.
      
      First the inner types are not the correct values to use to test for if a
      tunnel is present or not.  In addition the inner protocol types are not a
      bitmask as such performing an OR of the values doesn't make sense.  I have
      instead changed the code so that the inner protocol types are used to
      determine if we report CHECKSUM_UNNECESSARY or not.  For anything that does
      not end in UDP, TCP, or SCTP it doesn't make much sense to report a
      checksum offload since it won't contain a checksum anyway.
      
      This leaves us with the need to set the csum_level based on some value.
      For that purpose I am using the tunnel_type field.  If the tunnel type is
      GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
      header.  In the case of GRE or UDP we will have a possible checksum present
      so for this reason it should be safe to set the csum_level to 1 to indicate
      that we are reporting the state of the inner header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      858296c8
  9. 21 5月, 2016 2 次提交
  10. 14 5月, 2016 1 次提交
  11. 06 5月, 2016 3 次提交
  12. 02 5月, 2016 1 次提交
  13. 28 4月, 2016 1 次提交
  14. 26 4月, 2016 1 次提交
  15. 14 4月, 2016 1 次提交
  16. 07 4月, 2016 3 次提交
  17. 06 4月, 2016 1 次提交
  18. 05 4月, 2016 4 次提交
  19. 20 2月, 2016 1 次提交
  20. 19 2月, 2016 2 次提交
    • A
      i40e/i40evf: Move Tx checksum closer to TSO · 3bc67973
      Alexander Duyck 提交于
      On all of the other Intel drivers we place checksum close to TSO as they
      have a significant amount in common and it can help to reduce the decision
      tree for how to handle the frame as the first check in TSO is to see if
      checksumming is offloaded, and if it is not we can skip _BOTH_ TSO and Tx
      checksum offload based on a single check.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3bc67973
    • A
      i40e/i40evf: Rewrite logic for 8 descriptor per packet check · 2d37490b
      Alexander Duyck 提交于
      This patch is meant to rewrite the logic for how we determine if we can
      transmit the frame or if it needs to be linearized.
      
      The previous code for this function was using a mix of division and modulus
      division as a part of computing if we need to take the slow path.  Instead
      I have replaced this by simply working with a sliding window which will
      tell us if the frame would be capable of causing a single packet to span
      several descriptors.
      
      The logic for the scan is fairly simple.  If any given group of 6 fragments
      is less than gso_size - 1 then it is possible for us to have one byte
      coming out of the first fragment, 6 fragments, and one or more bytes coming
      out of the last fragment.  This gives us a total of 8 fragments
      which exceeds what we can allow so we send such frames to be linearized.
      
      Arguably the use of modulus might be more exact as the approach I propose
      may generate some false positives.  However the likelihood of us taking much
      of a hit for those false positives is fairly low, and I would rather not
      add more overhead in the case where we are receiving a frame composed of 4K
      pages.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2d37490b