1. 20 7月, 2016 1 次提交
    • B
      net/mlx4_en: add support for fast rx drop bpf program · 47a38e15
      Brenden Blanco 提交于
      Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.
      
      In tc/socket bpf programs, helpers linearize skb fragments as needed
      when the program touches the packet data. However, in the pursuit of
      speed, XDP programs will not be allowed to use these slower functions,
      especially if it involves allocating an skb.
      
      Therefore, disallow MTU settings that would produce a multi-fragment
      packet that XDP programs would fail to access. Future enhancements could
      be done to increase the allowable MTU.
      
      The xdp program is present as a per-ring data structure, but as of yet
      it is not possible to set at that granularity through any ndo.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47a38e15
  2. 24 6月, 2016 1 次提交
  3. 23 6月, 2016 2 次提交
  4. 18 6月, 2016 1 次提交
  5. 17 6月, 2016 1 次提交
  6. 10 6月, 2016 1 次提交
  7. 26 5月, 2016 4 次提交
  8. 06 5月, 2016 1 次提交
    • H
      net/mlx4: Avoid wrong virtual mappings · 73898db0
      Haggai Abramovsky 提交于
      The dma_alloc_coherent() function returns a virtual address which can
      be used for coherent access to the underlying memory.  On some
      architectures, like arm64, undefined behavior results if this memory is
      also accessed via virtual mappings that are not coherent.  Because of
      their undefined nature, operations like virt_to_page() return garbage
      when passed virtual addresses obtained from dma_alloc_coherent().  Any
      subsequent mappings via vmap() of the garbage page values are unusable
      and result in bad things like bus errors (synchronous aborts in ARM64
      speak).
      
      The mlx4 driver contains code that does the equivalent of:
      vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
      device is opened.
      
      Prevent Ethernet driver to run this problematic code by forcing it to
      allocate contiguous memory. As for the Infiniband driver, at first we
      are trying to allocate contiguous memory, but in case of failure roll
      back to work with fragmented memory.
      Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reported-by: NDavid Daney <david.daney@cavium.com>
      Tested-by: NSinan Kaya <okaya@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73898db0
  9. 05 5月, 2016 2 次提交
  10. 22 4月, 2016 1 次提交
  11. 04 3月, 2016 1 次提交
    • J
      net: relax setup_tc ndo op handle restriction · 5eb4dce3
      John Fastabend 提交于
      I added this check in setup_tc to multiple drivers,
      
       if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
      
      Unfortunately restricting to TC_H_ROOT like this breaks the old
      instantiation of mqprio to setup a hardware qdisc. This patch
      relaxes the test to only check the type to make it equivalent
      to the check before I broke it. With this the old instantiation
      continues to work.
      
      A good smoke test is to setup mqprio with,
      
      # tc qdisc add dev eth4 root mqprio num_tc 8 \
        map 0 1 2 3 4 5 6 7 \
        queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7
      
      Fixes: e4c6734e ("net: rework ndo tc op to consume additional qdisc handle paramete")
      Reported-by: NSingh Krishneil <krishneil.k.singh@intel.com>
      Reported-by: NJake Keller <jacob.e.keller@intel.com>
      CC: Murali Karicheri <m-karicheri2@ti.com>
      CC: Shradha Shah <sshah@solarflare.com>
      CC: Or Gerlitz <ogerlitz@mellanox.com>
      CC: Ariel Elior <ariel.elior@qlogic.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Bruce Allan <bruce.w.allan@intel.com>
      CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
      CC: Don Skidmore <donald.c.skidmore@intel.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5eb4dce3
  12. 03 3月, 2016 1 次提交
    • J
      net/mlx4_core: Allow resetting VF admin mac to zero · 6e522422
      Jack Morgenstein 提交于
      The VF administrative mac addresses (stored in the PF driver) are
      initialized to zero when the PF driver starts up.
      
      These addresses may be modified in the PF driver through ndo calls
      initiated by iproute2 or libvirt.
      
      While we allow the PF/host to change the VF admin mac address from zero
      to a valid unicast mac, we do not allow restoring the VF admin mac to
      zero. We currently only allow changing this mac to a different unicast mac.
      
      This leads to problems when libvirt scripts are used to deal with
      VF mac addresses, and libvirt attempts to revoke the mac so this
      host will not use it anymore.
      
      Fix this by allowing resetting a VF administrative MAC back to zero.
      
      Fixes: 8f7ba3ca ('net/mlx4: Add set VF mac address support')
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Reported-by: NMoshe Levi <moshele@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e522422
  13. 02 3月, 2016 1 次提交
  14. 17 2月, 2016 3 次提交
  15. 19 12月, 2015 2 次提交
  16. 19 11月, 2015 1 次提交
    • E
      mlx4: remove mlx4_en_low_latency_recv() · 868fdb06
      Eric Dumazet 提交于
      Busy polling can now be handled in generic NAPI poll infrastructure.
      This removes complexity and fast path overhead :
      
      mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
      in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()
      
      Tested:
      
      Without busy polling :
      
      lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    47330.78
      
      With busy polling :
      
      lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    97643.55
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      868fdb06
  17. 15 10月, 2015 1 次提交
    • J
      net/mlx4_core: Replace VF zero mac with random mac in mlx4_core · 2b3ddf27
      Jack Morgenstein 提交于
      By design, when no default MAC addresses are set in the Hypervisor for VFs,
      the VFs are passed zero-macs. When such a MAC is received by the VF, it
      generates a random MAC address and registers that MAC address
      with the Hypervisor.
      
      This random mac generation is currently done in the mlx4_en module.
      There is a problem, though, if the mlx4_ib module is loaded by a VF before
      the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced
      zero-mac and register that zero-mac as part of QP1 initialization.
      
      Having a zero-mac in the port's MAC table creates problems for a
      Baseboard Management Console. The BMC occasionally sends packets with a
      zero-mac destination MAC. If there is a zero-mac present in the port's
      MAC table, the FW will send such BMC packets to the host driver rather than
      to the wire, and BMC will stop working.
      
      To address this problem, we move the replacement of zero-mac addresses
      with random-mac addresses to procedure mlx4_slave_cap(), which is part of the
      driver startup for VFs, and is before activation of mlx4_ib and mlx4_en.
      As a result, zero-mac addresses will never be registered in the port MAC table
      by the driver.
      
      In addition, when mlx4_en does initialize the net device, it needs to set
      the NET_ADDR_RANDOM flag in the netdev structure if the address was
      randomly generated. This is done so that udev on the VM does not create
      a new device name after each VF probe (VM boot and such). To accomplish this,
      we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces
      a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en
      initializes the net-device.
      
      Fix was suggested by Matan Barak <matanb@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b3ddf27
  18. 09 10月, 2015 1 次提交
  19. 28 7月, 2015 1 次提交
    • H
      net/mlx4_en: Add support for hardware accelerated 802.1ad vlan · e38af4fa
      Hadar Hen Zion 提交于
      To enable device support in accelerated 802.1ad vlan, the port
      capability "packet has vlan enable" (phv_en) should be set.
      Firmware won't work properly, in case phv_en is not set.
      
      The user can enable "phv_en" port capability with the new ethtool
      private flag phv-bit. The phv-bit private flag default value is OFF,
      users who are interested in 802.1ad hardware acceleration should turn ON
      the phv-bit private flag:
      $ ethtool --set-priv-flags eth1 phv-bit on
      
      Once the private flag is set, the device is ready for 802.1ad vlan
      acceleration.
      
      The user should also change the interface device features and turn on
      "tx-vlan-stag-hw-insert" which is off by default:
      $ ethtool -K eth1  tx-vlan-stag-hw-insert on
      
      "phv-bit" private flag setting is available only for Physical
      Functions(PF), the Virtual Function (VF) will be able to use the feature
      by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
      feature was enabled by the Hypervisor.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e38af4fa
  20. 25 6月, 2015 1 次提交
  21. 16 6月, 2015 3 次提交
  22. 31 5月, 2015 1 次提交
    • M
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak 提交于
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      
      An exception to this rule is the ASYNC EQ which handles various events.
      
      Legacy EQs are completely removed as all EQs could be shared.
      
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c66fa19c
  23. 28 5月, 2015 1 次提交
    • R
      cpumask_set_cpu_local_first => cpumask_local_spread, lament · f36963c9
      Rusty Russell 提交于
      da91309e (cpumask: Utility function to set n'th cpu...) created a
      genuinely weird function.  I never saw it before, it went through DaveM.
      (He only does this to make us other maintainers feel better about our own
      mistakes.)
      
      cpumask_set_cpu_local_first's purpose is say "I need to spread things
      across N online cpus, choose the ones on this numa node first"; you call
      it in a loop.
      
      It can fail.  One of the two callers ignores this, the other aborts and
      fails the device open.
      
      It can fail in two ways: allocating the off-stack cpumask, or through a
      convoluted codepath which AFAICT can only occur if cpu_online_mask
      changes.  Which shouldn't happen, because if cpu_online_mask can change
      while you call this, it could return a now-offline cpu anyway.
      
      It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
      drawn to my attention by Geert, who said this causes a warning on Sparc.
      It sets a single bit in a cpumask instead of returning a cpu number,
      because that's what the callers want.
      
      It could be made more efficient by passing the previous cpu rather than
      an index, but that would be more invasive to the callers.
      
      Fixes: da91309e
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
      Tested-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      f36963c9
  24. 01 5月, 2015 2 次提交
  25. 03 4月, 2015 5 次提交