1. 15 10月, 2015 1 次提交
    • J
      net/mlx4_core: Replace VF zero mac with random mac in mlx4_core · 2b3ddf27
      Jack Morgenstein 提交于
      By design, when no default MAC addresses are set in the Hypervisor for VFs,
      the VFs are passed zero-macs. When such a MAC is received by the VF, it
      generates a random MAC address and registers that MAC address
      with the Hypervisor.
      
      This random mac generation is currently done in the mlx4_en module.
      There is a problem, though, if the mlx4_ib module is loaded by a VF before
      the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced
      zero-mac and register that zero-mac as part of QP1 initialization.
      
      Having a zero-mac in the port's MAC table creates problems for a
      Baseboard Management Console. The BMC occasionally sends packets with a
      zero-mac destination MAC. If there is a zero-mac present in the port's
      MAC table, the FW will send such BMC packets to the host driver rather than
      to the wire, and BMC will stop working.
      
      To address this problem, we move the replacement of zero-mac addresses
      with random-mac addresses to procedure mlx4_slave_cap(), which is part of the
      driver startup for VFs, and is before activation of mlx4_ib and mlx4_en.
      As a result, zero-mac addresses will never be registered in the port MAC table
      by the driver.
      
      In addition, when mlx4_en does initialize the net device, it needs to set
      the NET_ADDR_RANDOM flag in the netdev structure if the address was
      randomly generated. This is done so that udev on the VM does not create
      a new device name after each VF probe (VM boot and such). To accomplish this,
      we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces
      a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en
      initializes the net-device.
      
      Fix was suggested by Matan Barak <matanb@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b3ddf27
  2. 09 10月, 2015 1 次提交
  3. 28 7月, 2015 1 次提交
    • H
      net/mlx4_en: Add support for hardware accelerated 802.1ad vlan · e38af4fa
      Hadar Hen Zion 提交于
      To enable device support in accelerated 802.1ad vlan, the port
      capability "packet has vlan enable" (phv_en) should be set.
      Firmware won't work properly, in case phv_en is not set.
      
      The user can enable "phv_en" port capability with the new ethtool
      private flag phv-bit. The phv-bit private flag default value is OFF,
      users who are interested in 802.1ad hardware acceleration should turn ON
      the phv-bit private flag:
      $ ethtool --set-priv-flags eth1 phv-bit on
      
      Once the private flag is set, the device is ready for 802.1ad vlan
      acceleration.
      
      The user should also change the interface device features and turn on
      "tx-vlan-stag-hw-insert" which is off by default:
      $ ethtool -K eth1  tx-vlan-stag-hw-insert on
      
      "phv-bit" private flag setting is available only for Physical
      Functions(PF), the Virtual Function (VF) will be able to use the feature
      by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
      feature was enabled by the Hypervisor.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e38af4fa
  4. 25 6月, 2015 1 次提交
  5. 16 6月, 2015 3 次提交
  6. 31 5月, 2015 1 次提交
    • M
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak 提交于
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      
      An exception to this rule is the ASYNC EQ which handles various events.
      
      Legacy EQs are completely removed as all EQs could be shared.
      
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c66fa19c
  7. 28 5月, 2015 1 次提交
    • R
      cpumask_set_cpu_local_first => cpumask_local_spread, lament · f36963c9
      Rusty Russell 提交于
      da91309e (cpumask: Utility function to set n'th cpu...) created a
      genuinely weird function.  I never saw it before, it went through DaveM.
      (He only does this to make us other maintainers feel better about our own
      mistakes.)
      
      cpumask_set_cpu_local_first's purpose is say "I need to spread things
      across N online cpus, choose the ones on this numa node first"; you call
      it in a loop.
      
      It can fail.  One of the two callers ignores this, the other aborts and
      fails the device open.
      
      It can fail in two ways: allocating the off-stack cpumask, or through a
      convoluted codepath which AFAICT can only occur if cpu_online_mask
      changes.  Which shouldn't happen, because if cpu_online_mask can change
      while you call this, it could return a now-offline cpu anyway.
      
      It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
      drawn to my attention by Geert, who said this causes a warning on Sparc.
      It sets a single bit in a cpumask instead of returning a cpu number,
      because that's what the callers want.
      
      It could be made more efficient by passing the previous cpu rather than
      an index, but that would be more invasive to the callers.
      
      Fixes: da91309e
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
      Tested-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      f36963c9
  8. 01 5月, 2015 2 次提交
  9. 03 4月, 2015 5 次提交
  10. 01 4月, 2015 4 次提交
  11. 30 3月, 2015 1 次提交
  12. 25 3月, 2015 1 次提交
    • I
      net/mlx4_en: Call register_netdevice in the proper location · e5eda89d
      Ido Shamay 提交于
      Netdevice registration should be performed a the end of the driver
      initialization flow. If we don't do that, after calling register_netdevice,
      device callbacks may be issued by higher layers of the stack before
      final configuration of the device is done.
      
      For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
      after the register_netdev command. System network scripts may configure
      the interface (UP) right after the registration, which also attach
      unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
      causing the firmware to fail the rule attachment.
      
      Fixes: 837052d0 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling")
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5eda89d
  13. 19 3月, 2015 3 次提交
  14. 04 3月, 2015 1 次提交
  15. 05 2月, 2015 2 次提交
  16. 26 1月, 2015 1 次提交
  17. 16 1月, 2015 1 次提交
  18. 27 12月, 2014 1 次提交
    • J
      net: Generalize ndo_gso_check to ndo_features_check · 5f35227e
      Jesse Gross 提交于
      GSO isn't the only offload feature with restrictions that
      potentially can't be expressed with the current features mechanism.
      Checksum is another although it's a general issue that could in
      theory apply to anything. Even if it may be possible to
      implement these restrictions in other ways, it can result in
      duplicate code or inefficient per-packet behavior.
      
      This generalizes ndo_gso_check so that drivers can remove any
      features that don't make sense for a given packet, similar to
      netif_skb_features(). It also converts existing driver
      restrictions to the new format, completing the work that was
      done to support tunnel protocols since the issues apply to
      checksums as well.
      
      By actually removing features from the set that are used to do
      offloading, it solves another problem with the existing
      interface. In these cases, GSO would run with the original set
      of features and not do anything because it appears that
      segmentation is not required.
      
      CC: Tom Herbert <therbert@google.com>
      CC: Joe Stringer <joestringer@nicira.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Hayes Wang <hayeswang@realtek.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NTom Herbert <therbert@google.com>
      Fixes: 04ffcb25 ("net: Add ndo_gso_check")
      Tested-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f35227e
  19. 17 12月, 2014 1 次提交
  20. 12 12月, 2014 3 次提交
    • M
      net/mlx4: Add support for A0 steering · 7d077cd3
      Matan Barak 提交于
      Add the required firmware commands for A0 steering and a way to enable
      that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
      QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
      to configure and query the device.
      
      The different A0 DMFS (steering) modes are:
      
      Static - optimized performance, but flow steering rules are
      limited. This mode should be choosed explicitly by the user
      in order to be used.
      
      Dynamic - this mode should be explicitly choosed by the user.
      In this mode, the FW works in optimized steering mode as long as
      it can and afterwards automatically drops to classic (full) DMFS.
      
      Disable - this mode should be explicitly choosed by the user.
      The user instructs the system not to use optimized steering, even if
      the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
      steering in Default A0 DMFS mode).
      
      Default - this mode is implicitly choosed. In this mode, if the FW
      supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
      work at Disable A0 DMFS mode.
      
      Under SRIOV configuration, when the A0 steering mode is enabled,
      older guest VF drivers who aren't using the RX QP allocation flag
      (MLX4_RESERVE_A0_QP) will get a QP from the general range and
      fail when attempting to register a steering rule. To avoid that,
      the PF context behaviour is changed once on A0 static mode, to
      require support for the allocation flag in VF drivers too.
      
      In order to enable A0 steering, we use log_num_mgm_entry_size param.
      If the value of the parameter is not positive, we treat the absolute
      value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
      bit field enables static A0 steering.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d077cd3
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
  21. 09 12月, 2014 1 次提交
  22. 03 12月, 2014 1 次提交
  23. 24 11月, 2014 1 次提交
    • E
      mlx4: fix mlx4_en_set_rxfh() · bd635c35
      Eric Dumazet 提交于
      mlx4_en_set_rxfh() can crash if no RSS indir table is provided.
      
      While we are at it, allow RSS key to be changed with ethtool -X
      
      Tested:
      
      myhost:~# cat /proc/sys/net/core/netdev_rss_key
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30:ab:8a:6e:c3:6a
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e
      
      myhost:~# ethtool -X eth0 hkey \
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      
      myhost:~# ethtool -x eth0
      RX flow hash indirection table for eth0 with 8 RX ring(s):
          0:      0     1     2     3     4     5     6     7
      RSS hash key:
      03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: b9d1ab7e ("mlx4: use netdev_rss_key_fill() helper")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd635c35
  24. 20 11月, 2014 1 次提交
    • O
      net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too · 9737c6ab
      Or Gerlitz 提交于
      This is currently missing, which results in a crash when one attempts
      to set VXLAN tunnel over the mlx4_en when acting as PF.
      
      	[ 2408.785472] BUG: unable to handle kernel NULL pointer dereference at (null)
      	[...]
      	[ 2408.994104] Call Trace:
      	[ 2408.996584]  [<ffffffffa021f7f5>] ? vxlan_get_rx_port+0xd6/0x103 [vxlan]
      	[ 2409.003316]  [<ffffffffa021f71f>] ? vxlan_lowerdev_event+0xf2/0xf2 [vxlan]
      	[ 2409.010225]  [<ffffffffa0630358>] mlx4_en_start_port+0x862/0x96a [mlx4_en]
      	[ 2409.017132]  [<ffffffffa063070f>] mlx4_en_open+0x17f/0x1b8 [mlx4_en]
      
      While here, make sure to invoke vxlan_get_rx_port() only when VXLAN
      offloads are actually enabled and not when they are only supported.
      Reported-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9737c6ab
  25. 15 11月, 2014 1 次提交