1. 31 5月, 2015 1 次提交
    • M
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak 提交于
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      
      An exception to this rule is the ASYNC EQ which handles various events.
      
      Legacy EQs are completely removed as all EQs could be shared.
      
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c66fa19c
  2. 01 5月, 2015 1 次提交
  3. 10 4月, 2015 1 次提交
    • A
      mlx4/mlx5: Use dma_wmb/rmb where appropriate · 12b3375f
      Alexander Duyck 提交于
      This patch should help to improve the performance of the mlx4 and mlx5 on a
      number of architectures.  For example, on x86 the dma_wmb/rmb equates out
      to a barrer() call as the architecture is already strong ordered, and on
      PowerPC the call works out to a lwsync which is significantly less expensive
      than the sync call that was being used for wmb.
      
      I placed the new barriers between any spots that seemed to be trying to
      order memory/memory reads or writes, if there are any spots that involved
      MMIO I left the existing wmb in place as the new barriers cannot order
      transactions between coherent and non-coherent memories.
      
      v2: Reduced the replacments to just the spots where I could clearly
          identify the usage pattern.
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Ido Shamay <idos@mellanox.com>
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b3375f
  4. 03 4月, 2015 1 次提交
  5. 05 2月, 2015 2 次提交
  6. 26 1月, 2015 1 次提交
  7. 12 12月, 2014 3 次提交
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
    • O
      net/mlx4_en: Set csum level for encapsulated packets · c58942f2
      Or Gerlitz 提交于
      This was dropped by mistake for the napi_gro_frags flow, fix that.
      
      Fixes: dd65beac ('net/mlx4_en: Extend usage of napi_gro_frags')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c58942f2
  8. 09 12月, 2014 1 次提交
  9. 24 11月, 2014 1 次提交
    • E
      mlx4: fix mlx4_en_set_rxfh() · bd635c35
      Eric Dumazet 提交于
      mlx4_en_set_rxfh() can crash if no RSS indir table is provided.
      
      While we are at it, allow RSS key to be changed with ethtool -X
      
      Tested:
      
      myhost:~# cat /proc/sys/net/core/netdev_rss_key
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30:ab:8a:6e:c3:6a
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e
      
      myhost:~# ethtool -X eth0 hkey \
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: b9d1ab7e ("mlx4: use netdev_rss_key_fill() helper")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd635c35
  10. 17 11月, 2014 1 次提交
  11. 12 11月, 2014 2 次提交
  12. 11 11月, 2014 2 次提交
    • E
      mlx4: restore conditional call to napi_complete_done() · 2e1af7d7
      Eric Dumazet 提交于
      After commit 1a288172 ("mlx4: use napi_complete_done()") we ended up
      calling napi_complete_done() in the case NAPI poll consumed all its
      budget.
      
      This added extra interrupt pressure, this patch restores proper
      behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 1a288172 ("mlx4: use napi_complete_done()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e1af7d7
    • E
      mlx4: use napi_complete_done() · 1a288172
      Eric Dumazet 提交于
      To enable gro_flush_timeout, a driver has to use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a288172
  13. 04 11月, 2014 2 次提交
  14. 31 10月, 2014 1 次提交
  15. 29 10月, 2014 1 次提交
    • J
      net/mlx4_en: Cleanups suggested by clang static checker · c2a3d4b4
      Jack Morgenstein 提交于
      clang flagged the following. All are actually cosmetic cleanups, not really bugs:
      
      drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
                      err = -ENOMEM;
                      ^     ~~~~~~~
      drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
                      err = -ENOMEM;
      
      drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
              entry->reg_id = reg_id;
                            ^ ~~~~~~
      drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
              mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
      (NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
       This is not a bug. Cleanup here is therefore cosmetic only).
      
      drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
                      frag_info = &priv->frag_info[i];
                      ^           ~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2a3d4b4
  16. 11 10月, 2014 1 次提交
  17. 20 9月, 2014 1 次提交
    • I
      net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da
      Ido Shamay 提交于
      This function derives the base address of the CQE from the CQE size,
      and calculates the real CQE context segment in it from the factor
      (this is like before). Before this change the code used the factor to
      calculate the base address of the CQE as well.
      
      The factor indicates in which segment of the cqe stride the cqe information
      is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
      the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
      only 32 and 64 byte strides. However, with larger strides, the factor is zero,
      and so cannot be used to calculate the base of the CQE.
      
      The helper uses the same method of CQE buffer pulling made by all other
      components that reads the CQE buffer (mlx4_ib driver and libmlx4).
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1b6b4da
  18. 06 9月, 2014 1 次提交
  19. 30 8月, 2014 1 次提交
  20. 23 7月, 2014 1 次提交
  21. 15 7月, 2014 1 次提交
    • J
      mlx4: mark napi id for gro_skb · 32b333fe
      Jason Wang 提交于
      Napi id was not marked for gro_skb, this will lead rx busy loop won't
      work correctly since they stack never try to call low latency receive
      method because of a zero socket napi id. Fix this by marking napi id
      for gro_skb.
      
      The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
      (from 20531.68 to 30610.88).
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b333fe
  22. 09 7月, 2014 1 次提交
  23. 03 7月, 2014 1 次提交
  24. 03 6月, 2014 1 次提交
  25. 15 5月, 2014 1 次提交
  26. 09 5月, 2014 1 次提交
  27. 15 3月, 2014 1 次提交
  28. 25 2月, 2014 2 次提交
  29. 14 1月, 2014 1 次提交
  30. 01 1月, 2014 1 次提交
    • O
      net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling · 837052d0
      Or Gerlitz 提交于
      When the device tunneling offloads mode is vxlan do the following
      
       - call SET_PORT with the relevant setting
      
       - add DMFS steering vxlan rule for the device self and multicast mac addresses
         of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP
      
       - set relevant QPC fields in RSS context and RX ring QPs
      
       - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs
         which are marked for encapsulation such that the HW will segment them properly.
      
       - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE
      
       - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      837052d0
  31. 19 12月, 2013 1 次提交
  32. 08 11月, 2013 2 次提交