1. 30 10月, 2016 7 次提交
  2. 14 10月, 2016 1 次提交
    • B
      net/mlx4_en: fixup xdp tx irq to match rx · 958b3d39
      Brenden Blanco 提交于
      In cases where the number of tx rings is not a multiple of the number of
      rx rings, the tx completion event will be handled on a different core
      from the transmit and population of the ring. Races on the ring will
      lead to a double-free of the page, and possibly other corruption.
      
      The rings are initialized by default with a valid multiple of rings,
      based on the number of cpus, therefore an invalid configuration requires
      ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx
      8' will cause packets received on rx0, and XDP_TX'd to tx48, to be
      completed on cpu3 (48 % 9 == 3).
      
      Resolve this discrepancy by shifting the irq for the xdp tx queues to
      start again from 0, modulo rx_ring_num.
      
      Fixes: 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support")
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      958b3d39
  3. 08 10月, 2016 1 次提交
    • J
      IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets · fd10ed8e
      Jack Morgenstein 提交于
      In MLX qp packets, the LRH (built by the driver) has both a VL field
      and an SL field. When building a QP1 packet, the VL field should
      reflect the SLtoVL mapping and not arbitrarily contain zero (as is
      done now). This bug causes credit problems in IB switches at
      high rates of QP1 packets.
      
      The fix is to cache the SL to VL mapping in the driver, and look up
      the VL mapped to the SL provided in the send request when sending
      QP1 packets.
      
      For FW versions which support generating a port_management_config_change
      event with subtype sl-to-vl-table-change, the driver uses that event
      to update its sl-to-vl mapping cache.  Otherwise, the driver snoops
      incoming SMP mads to update the cache.
      
      There remains the case where the FW is running in secure-host mode
      (so no QP0 packets are delivered to the driver), and the FW does not
      generate the sl2vl mapping change event. To support this case, the
      driver updates (via querying the FW) its sl2vl mapping cache when
      running in secure-host mode when it receives either a Port Up event
      or a client-reregister event (where the port is still up, but there
      may have been an opensm failover).
      OpenSM modifies the sl2vl mapping before Port Up and Client-reregister
      events occur, so if there is a mapping change the driver's cache will
      be properly updated.
      
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fd10ed8e
  4. 30 9月, 2016 1 次提交
    • D
      mlx4: remove unused fields · 5038056e
      David Decotigny 提交于
      This also can address following UBSAN warnings:
      [   36.640343] ================================================================================
      [   36.648772] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:857:26
      [   36.656853] shift exponent 64 is too large for 32-bit type 'int'
      [   36.663348] ================================================================================
      [   36.671783] ================================================================================
      [   36.680213] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:861:27
      [   36.688297] shift exponent 35 is too large for 32-bit type 'int'
      [   36.694702] ================================================================================
      
      Tested:
        reboot with UBSAN, no warning.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5038056e
  5. 24 9月, 2016 5 次提交
  6. 22 9月, 2016 6 次提交
  7. 20 9月, 2016 1 次提交
  8. 19 9月, 2016 1 次提交
  9. 16 9月, 2016 1 次提交
  10. 12 9月, 2016 4 次提交
  11. 07 9月, 2016 1 次提交
    • B
      net/mlx4_en: protect ring->xdp_prog with rcu_read_lock · 326fe02d
      Brenden Blanco 提交于
      Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
      freed despite the use of call_rcu inside bpf_prog_put. The situation is
      possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
      callback for destroying the bpf prog can run even during the bh handling
      in the mlx4 rx path.
      
      Several options were considered before this patch was settled on:
      
      Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
      of the rings are updated with the new program.
      This approach has the disadvantage that as the number of rings
      increases, the speed of update will slow down significantly due to
      napi_synchronize's msleep(1).
      
      Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
      The action of the bpf_prog_put_bh would be to then call bpf_prog_put
      later. Those drivers that consume a bpf prog in a bh context (like mlx4)
      would then use the bpf_prog_put_bh instead when the ring is up. This has
      the problem of complexity, in maintaining proper refcnts and rcu lists,
      and would likely be harder to review. In addition, this approach to
      freeing must be exclusive with other frees of the bpf prog, for instance
      a _bh prog must not be referenced from a prog array that is consumed by
      a non-_bh prog.
      
      The placement of rcu_read_lock in this patch is functionally the same as
      putting an rcu_read_lock in napi_poll. Actually doing so could be a
      potentially controversial change, but would bring the implementation in
      line with sk_busy_loop (though of course the nature of those two paths
      is substantially different), and would also avoid future copy/paste
      problems with future supporters of XDP. Still, this patch does not take
      that opinionated option.
      
      Testing was done with kernels in either PREEMPT_RCU=y or
      CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
      any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
      not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
      overhead of rcu_read_lock (according to perf) was the same before/after.
      In the rx path, rcu_read_lock is eventually called for every packet
      from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
      is easily amortized.
      
      v2:
      Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
      Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
      rcu_dereference[_protected] as appropriate.
      Add explicit mutex lock around rcu_assign instead of xchg loop.
      
      Fixes: d576acf0 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      326fe02d
  12. 04 8月, 2016 2 次提交
  13. 26 7月, 2016 1 次提交
  14. 21 7月, 2016 1 次提交
  15. 20 7月, 2016 6 次提交
    • B
      net/mlx4_en: add xdp forwarding and data write support · 9ecc2d86
      Brenden Blanco 提交于
      A user will now be able to loop packets back out of the same port using
      a bpf program attached to xdp hook. Updates to the packet contents from
      the bpf program is also supported.
      
      For the packet write feature to work, the rx buffers are now mapped as
      bidirectional when the page is allocated. This occurs only when the xdp
      hook is active.
      
      When the program returns a TX action, enqueue the packet directly to a
      dedicated tx ring, so as to avoid completely any locking. This requires
      the tx ring to be allocated 1:1 for each rx ring, as well as the tx
      completion running in the same softirq.
      
      Upon tx completion, this dedicated tx ring recycles pages without
      unmapping directly back to the original rx ring. In steady state tx/drop
      workload, effectively 0 page allocs/frees will occur.
      
      In order to separate out the paths between free and recycle, a
      free_tx_desc func pointer is introduced that is optionally updated
      whenever recycle_ring is activated. By default the original free
      function is always initialized.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ecc2d86
    • B
      net/mlx4_en: break out tx_desc write into separate function · 224e92e0
      Brenden Blanco 提交于
      In preparation for writing the tx descriptor from multiple functions,
      create a helper for both normal and blueflame access.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      224e92e0
    • B
      net/mlx4_en: add page recycle to prepare rx ring for tx support · d576acf0
      Brenden Blanco 提交于
      The mlx4 driver by default allocates order-3 pages for the ring to
      consume in multiple fragments. When the device has an xdp program, this
      behavior will prevent tx actions since the page must be re-mapped in
      TODEVICE mode, which cannot be done if the page is still shared.
      
      Start by making the allocator configurable based on whether xdp is
      running, such that order-0 pages are always used and never shared.
      
      Since this will stress the page allocator, add a simple page cache to
      each rx ring. Pages in the cache are left dma-mapped, and in drop-only
      stress tests the page allocator is eliminated from the perf report.
      
      Note that setting an xdp program will now require the rings to be
      reconfigured.
      
      Before:
       26.91%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
       17.88%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
        6.00%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
        4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
        3.21%  swapper      [kernel.vmlinux]  [k] intel_idle
        2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
        2.57%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
      
      After:
       31.72%  swapper      [kernel.vmlinux]       [k] intel_idle
        8.79%  swapper      [mlx4_en]              [k] mlx4_en_process_rx_cq
        7.54%  swapper      [kernel.vmlinux]       [k] poll_idle
        6.36%  swapper      [mlx4_core]            [k] mlx4_eq_int
        4.21%  swapper      [kernel.vmlinux]       [k] tasklet_action
        4.03%  swapper      [kernel.vmlinux]       [k] cpuidle_enter_state
        3.43%  swapper      [mlx4_en]              [k] mlx4_en_prepare_rx_desc
        2.18%  swapper      [kernel.vmlinux]       [k] native_irq_return_iret
        1.37%  swapper      [kernel.vmlinux]       [k] menu_select
        1.09%  swapper      [kernel.vmlinux]       [k] bpf_map_lookup_elem
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d576acf0
    • B
      net/mlx4_en: add support for fast rx drop bpf program · 47a38e15
      Brenden Blanco 提交于
      Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.
      
      In tc/socket bpf programs, helpers linearize skb fragments as needed
      when the program touches the packet data. However, in the pursuit of
      speed, XDP programs will not be allowed to use these slower functions,
      especially if it involves allocating an skb.
      
      Therefore, disallow MTU settings that would produce a multi-fragment
      packet that XDP programs would fail to access. Future enhancements could
      be done to increase the allowable MTU.
      
      The xdp program is present as a per-ring data structure, but as of yet
      it is not possible to set at that granularity through any ndo.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47a38e15
    • E
      net/mlx4_en: Add resilience in low memory systems · ec25bc04
      Eugenia Emantayev 提交于
      This patch fixes the lost of Ethernet port on low memory system,
      when driver frees its resources and fails to allocate new resources.
      Issue could happen while changing number of channels, rings size or
      changing the timestamp configuration.
      This fix is necessary because of removing vmap use in the code.
      When vmap was in use driver could allocate non-contiguous memory
      and make it contiguous with vmap. Now it could fail to allocate
      a large chunk of contiguous memory and lose the port.
      Current code tries to allocate new resources and then upon success
      frees the old resources.
      
      Fixes: 73898db0 ('net/mlx4: Avoid wrong virtual mappings')
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec25bc04
    • E
      net/mlx4_en: Move filters cleanup to a proper location · 30f56e3c
      Eugenia Emantayev 提交于
      Filters cleanup should be done once before destroying net device,
      since filters list is contained in the private data.
      
      Fixes: 1eb8c695 ('net/mlx4_en: Add accelerated RFS support')
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30f56e3c
  16. 05 7月, 2016 1 次提交