1. 12 12月, 2014 4 次提交
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
    • M
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak 提交于
      Previously, we've fired all our completion callbacks straight from our ISR.
      
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dca0f42
    • O
      net/mlx4_core: Mask out host side virtualization features for guests · 383677da
      Or Gerlitz 提交于
      When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
      need not be told that host side virtualization features such as VST, FSM
      (MAC anti-spoofing) and running > 80 VFs are supported by the device.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      383677da
    • O
      net/mlx4_en: Set csum level for encapsulated packets · c58942f2
      Or Gerlitz 提交于
      This was dropped by mistake for the napi_gro_frags flow, fix that.
      
      Fixes: dd65beac ('net/mlx4_en: Extend usage of napi_gro_frags')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c58942f2
  2. 09 12月, 2014 8 次提交
  3. 03 12月, 2014 1 次提交
  4. 27 11月, 2014 1 次提交
  5. 24 11月, 2014 1 次提交
    • E
      mlx4: fix mlx4_en_set_rxfh() · bd635c35
      Eric Dumazet 提交于
      mlx4_en_set_rxfh() can crash if no RSS indir table is provided.
      
      While we are at it, allow RSS key to be changed with ethtool -X
      
      Tested:
      
      myhost:~# cat /proc/sys/net/core/netdev_rss_key
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30:ab:8a:6e:c3:6a
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e
      
      myhost:~# ethtool -X eth0 hkey \
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: b9d1ab7e ("mlx4: use netdev_rss_key_fill() helper")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd635c35
  6. 22 11月, 2014 3 次提交
  7. 20 11月, 2014 1 次提交
    • O
      net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too · 9737c6ab
      Or Gerlitz 提交于
      This is currently missing, which results in a crash when one attempts
      to set VXLAN tunnel over the mlx4_en when acting as PF.
      
      	[ 2408.785472] BUG: unable to handle kernel NULL pointer dereference at (null)
      	[...]
      	[ 2408.994104] Call Trace:
      	[ 2408.996584]  [<ffffffffa021f7f5>] ? vxlan_get_rx_port+0xd6/0x103 [vxlan]
      	[ 2409.003316]  [<ffffffffa021f71f>] ? vxlan_lowerdev_event+0xf2/0xf2 [vxlan]
      	[ 2409.010225]  [<ffffffffa0630358>] mlx4_en_start_port+0x862/0x96a [mlx4_en]
      	[ 2409.017132]  [<ffffffffa063070f>] mlx4_en_open+0x17f/0x1b8 [mlx4_en]
      
      While here, make sure to invoke vxlan_get_rx_port() only when VXLAN
      offloads are actually enabled and not when they are only supported.
      Reported-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9737c6ab
  8. 17 11月, 2014 1 次提交
  9. 15 11月, 2014 1 次提交
  10. 14 11月, 2014 7 次提交
  11. 12 11月, 2014 3 次提交
  12. 11 11月, 2014 2 次提交
    • E
      mlx4: restore conditional call to napi_complete_done() · 2e1af7d7
      Eric Dumazet 提交于
      After commit 1a288172 ("mlx4: use napi_complete_done()") we ended up
      calling napi_complete_done() in the case NAPI poll consumed all its
      budget.
      
      This added extra interrupt pressure, this patch restores proper
      behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 1a288172 ("mlx4: use napi_complete_done()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e1af7d7
    • E
      mlx4: use napi_complete_done() · 1a288172
      Eric Dumazet 提交于
      To enable gro_flush_timeout, a driver has to use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a288172
  13. 07 11月, 2014 2 次提交
  14. 04 11月, 2014 5 次提交