1. 01 5月, 2017 32 次提交
  2. 30 4月, 2017 8 次提交
    • E
      net/mlx5: E-Switch, Avoid redundant memory allocation · 0a0ab1d2
      Eli Cohen 提交于
      struct esw_mc_addr is a small struct that can be part of struct
      mlx5_eswitch. Define it as a field and not as a pointer and save the
      kzalloc call and then error flow handling.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      0a0ab1d2
    • E
      net/mlx5e: Disable HW LRO when PCI is slower than link on striding RQ · 0f6e4cf6
      Eran Ben Elisha 提交于
      We will activate the HW LRO only on servers with PCI BW > MAX LINK BW,
      or when PCI BW > 16Gbps. On other cases we do not want LRO by default as
      LRO sessions might get timeout and add redundant software overhead.
      
      Tested:
      	ethtool -k <ifs-name> | grep large-receive-offload
      	On systems with and without the limitations.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      0f6e4cf6
    • T
      net/mlx5e: Use u8 as ownership type in mlx5e_get_cqe() · b1b03bde
      Tariq Toukan 提交于
      CQE ownership indication is as small as a single bit.
      Use u8 to speedup the comparison.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b1b03bde
    • T
      net/mlx5e: Use prefetchw when a write is to follow · ad78af9b
      Tariq Toukan 提交于
      "prefetchw()" prefetches the cacheline for write. Use it for
      skb->data, as soon we'll be copying the packet header there.
      
      Performance:
      Single-stream packet-rate tested with pktgen.
      Packets are dropped in tc level to zoom into driver data-path.
      Larger gain is expected for smaller packets, as less time
      is spent on handling SKB fragments, making the path shorter
      and the improvement more significant.
      
      ---------------------------------------------
      packet size | before    | after     | gain  |
      64B         | 4,113,306 | 4,778,720 |  16%  |
      1024B       | 3,633,819 | 3,950,593 | 8.7%  |
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      ad78af9b
    • T
      net/mlx5e: Optimize poll ICOSQ completion queue · 1f5b1e47
      Tariq Toukan 提交于
      UMR operations are more frequent and important.
      Check them first, and add a compiler branch predictor hint.
      
      According to current design, ICOSQ CQ can contain at most one
      pending CQE per napi. Poll function is optimized accordingly.
      
      Performance:
      Single-stream packet-rate tested with pktgen.
      Packets are dropped in tc level to zoom into driver data-path.
      Larger gain is expected for larger packet sizes, as BW is higher
      and UMR posts are more frequent.
      
      ---------------------------------------------
      packet size | before    | after     | gain  |
      64B         | 4,092,370 | 4,113,306 |  0.5% |
      1024B       | 3,421,435 | 3,633,819 |  6.2% |
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      1f5b1e47
    • H
      net/mlx5e: Act on delay probe time updates · a2fa1fe5
      Hadar Hen Zion 提交于
      The user can change delay_first_probe_time parameter through sysctl.
      Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the
      intervals for updating the neighbours 'used' value periodic task and
      for flow HW counters query periodic task.
      Both of the intervals will be update only in case the new delay prob
      time value is lower the current interval.
      
      Since the driver saves only one min interval value and not per device,
      the users will be able to set lower interval value for updating
      neighbour 'used' value periodic task but they won't be able to schedule
      a higher interval for this periodic task.
      The used interval for scheduling neighbour 'used' value periodic task is
      the minimal delay prob time parameter ever seen by the driver.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a2fa1fe5
    • H
      net/mlx5e: Update neighbour 'used' state using HW flow rules counters · f6dfb4c3
      Hadar Hen Zion 提交于
      When IP tunnel encapsulation rules are offloaded, the kernel can't see
      the traffic of the offloaded flow. The neighbour for the IP tunnel
      destination of the offloaded flow can mistakenly become STALE and
      deleted by the kernel since its 'used' value wasn't changed.
      
      To make sure that a neighbour which is used by the HW won't become
      STALE, we proactively update the neighbour 'used' value every
      DELAY_PROBE_TIME period, when packets were matched and counted by the HW
      for one of the tunnel encap flows related to this neighbour.
      
      The periodic task that updates the used neighbours is scheduled when a
      tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
      itself as long as the representor's neighbours list isn't empty.
      
      Add, remove, lookup and status change operations done over the
      representor's neighbours list or the neighbour hash entry encaps list
      are all serialized by RTNL lock.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f6dfb4c3
    • H
      net/mlx5e: Add support to neighbour update flow · 232c0013
      Hadar Hen Zion 提交于
      In order to offload TC encap rules, the driver does a lookup for the IP
      tunnel neighbour according to the output device and the destination IP
      given by the user.
      
      To keep tracking after the validity state of such neighbours, we keep
      the neighbours information (pair of device pointer and destination IP)
      in a hash table maintained at the relevant egress representor and
      register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update
      netevent, we search for a match among the cached neighbours entries used for
      encapsulation.
      
      In case the neighbour isn't valid, we can't offload the flow into the
      HW. We cache the flow (requested matching and actions) in the driver and
      offload the rule later, when the neighbour is resolved and becomes
      valid.
      
      When a flow is only cached in the driver and not offloaded into HW
      yet, we use EAGAIN return value to mark it internally, the TC ndo still
      returns success.
      
      Listen to kernel neighbour update netevents to trace relevant neighbours
      validity state:
      
      1. If a neighbour becomes valid, offload the related rules to HW.
      
      2. If the neighbour becomes invalid, remove the related rules from HW.
      
      3. If the neighbour mac address was changed, update the encap header.
         Remove all the offloaded rules using the old encap header from the HW
         and insert new rules to HW with updated encap header.
      
      Access to the neighbors hash table is protected by RTNL lock of its
      caller or by the table's spinlock.
      
      Details of the locking/synchronization among the different actions
      applied on the neighbour table:
      
      Add/remove operations - protected by RTNL lock of its caller (all TC
      commands are protected by RTNL lock). Add and remove operations are
      initiated only when the user inserts/removes a TC rule into/from the driver.
      
      Lookup/remove operations - since the lookup operation is done from
      netevent notifier block, RTNL lock can't be used (atomic context).
      Use the table's spin lock to protect lookups from TC user removal operation.
      bh is used since netevent can be called from a softirq context.
      
      Lookup/add operations - The hash table access functions are taking
      care of the protection between lookup and add operations.
      
      When adding/removing encap headers and rules to/from the HW, RTNL lock
      is used. It can happen when:
      
      1. The user inserts/removes a TC rule into/from the driver (TC commands
      are protected by RTNL lock of it's caller).
      
      2. The driver gets neighbour notification event, which reports about
      neighbour validity status change. Before adding/removing encap headers
      and rules to/from the HW, RTNL lock is taken.
      
      A neighbour hash table entry should be freed when its encap list is empty.
      Since The neighbour update netevent notification schedules a neighbour
      update work that uses the neighbour hash entry, it can't be freed
      unconditionally when the encap list becomes empty during TC delete rule flow.
      Use reference count to protect from freeing neighbour hash table entry
      while it's still in use.
      
      When the user asks to unregister a netdvice used by one of the neigbours,
      neighbour removal notification is received. Then we take a reference on the
      neighbour and don't free it until the relevant encap entries (and flows) are
      marked as invalid (not offloaded) and removed from HW.
      As long as the encap entry is still valid (checked under RTNL lock) we
      can safely access the neighbour device saved on mlx5e_neigh struct.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      232c0013