1. 09 7月, 2014 1 次提交
    • A
      net/mlx4_en: Ignore budget on TX napi polling · fbc6daf1
      Amir Vadai 提交于
      It is recommended that TX work not count against the quota.
      The cost of TX packet liberation is a minute percentage of what it costs to
      process an RX frame. Furthermore, that SKB freeing makes memory available for
      other paths in the stack.
      
      Give the TX a larger budget and be more aggressive about cleaning up the Tx
      descriptors this budget could be changed using ethtool:
      $ ethtool -C eth1 tx-frames-irq <budget>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbc6daf1
  2. 03 7月, 2014 1 次提交
  3. 03 6月, 2014 1 次提交
  4. 15 5月, 2014 1 次提交
  5. 09 5月, 2014 1 次提交
  6. 13 3月, 2014 1 次提交
  7. 03 3月, 2014 4 次提交
  8. 17 2月, 2014 1 次提交
  9. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  10. 01 1月, 2014 1 次提交
    • O
      net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling · 837052d0
      Or Gerlitz 提交于
      When the device tunneling offloads mode is vxlan do the following
      
       - call SET_PORT with the relevant setting
      
       - add DMFS steering vxlan rule for the device self and multicast mac addresses
         of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP
      
       - set relevant QPC fields in RSS context and RX ring QPs
      
       - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs
         which are marked for encapsulation such that the HW will segment them properly.
      
       - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE
      
       - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      837052d0
  11. 20 12月, 2013 2 次提交
  12. 08 11月, 2013 2 次提交
  13. 22 8月, 2013 3 次提交
  14. 29 7月, 2013 1 次提交
    • E
      net/mlx4_en: Fix BlueFlame race · 2d4b6466
      Eugenia Emantayev 提交于
      Fix a race between BlueFlame flow and stamping in post send flow.
      Example:
      	SW: Build WQE 0 on the TX buffer, except the ownership bit
      	SW: Set ownership for WQE 0 on the TX buffer
      	SW: Ring doorbell for WQE 0
      	SW: Build WQE 1 on the TX buffer, except the ownership bit
      	SW: Set ownership for WQE 1 on the TX buffer
      	HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
      	HW: Produce CQEs for WQE 0 and WQE 1
      	SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
      	SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
      	HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
      		so the BF index is 0xFFFF. Error: Invalid Opcode.
      As a result QP enters the error state and no traffic can be sent.
      
      Solution:
      When stamping - do not stamp last completed wqe.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d4b6466
  15. 05 6月, 2013 1 次提交
  16. 25 4月, 2013 2 次提交
  17. 08 2月, 2013 2 次提交
  18. 28 1月, 2013 1 次提交
  19. 21 1月, 2013 1 次提交
  20. 19 1月, 2013 1 次提交
  21. 03 12月, 2012 1 次提交
  22. 27 11月, 2012 1 次提交
    • O
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz 提交于
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      
      The SRIOV related flow is as follows
      
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      
      2. the PPF activates the new capabilities using INIT_HCA.
      
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      08ff3235
  23. 26 10月, 2012 2 次提交
  24. 02 10月, 2012 1 次提交
  25. 04 8月, 2012 1 次提交
  26. 18 5月, 2012 1 次提交
    • A
      net/mlx4_en: num cores tx rings for every UP · bc6a4744
      Amir Vadai 提交于
      Change the TX ring scheme such that the number of rings for untagged packets
      and for tagged packets (per each of the vlan priorities) is the same, unlike
      the current situation where for tagged traffic there's one ring per priority
      and for untagged rings as the number of core.
      
      Queue selection is done as follows:
      
      If the mqprio qdisc is operates on the interface, such that the core networking
      code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
      queue set is forced - for both, tagged and untagged traffic.
      
      Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
      all untagged traffic is sent through tx rings of UP 0.
      
      The patch follows the convergence of discussing that issue with John Fastabend
      over this thread http://comments.gmane.org/gmane.linux.network/229877
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Liran Liss <liranl@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6a4744
  27. 24 4月, 2012 2 次提交
  28. 05 4月, 2012 2 次提交
    • A
      net/mlx4_en: sk_prio <=> UP for untagged traffic · 897d7846
      Amir Vadai 提交于
      Since vlan egress map is only good for tagged traffic, need to have other
      mapping to be used by untagged traffic.
      For that, the driver uses sch_mqprio mapping. This mapping could be set by
      using tc tool from iproute2 package.
      Mapped UP will be used by the HW for QoS purposes, but won't go out on the
      wire.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      897d7846
    • A
      net/mlx4_en: Force user priority by QP attribute · 0e98b523
      Amir Vadai 提交于
      Instead of relying on HW to change schedule queue by UP, schedule
      queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
      resolves two issues with untagged traffic:
      1. untagged traffic has no UP in packet which is needed for QoS. The change
         above allows setting the schedule queue (and by that the UP) of such a stream.
      2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
         allows using BF for untagged but prioritized traffic.
      
      In old firmware that force UP is not supported, untagged traffic will not subject to
      QoS.
      
      Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
      module paramter is false.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e98b523