提交 · 67f8b1dcb9ee7f1e165da4eb2ec53483a6b141ea · openeuler / Kernel

03 11月, 2016 2 次提交

net/mlx4_en: Refactor the XDP forwarding rings scheme · 67f8b1dc

由 Tariq Toukan 提交于 11月 02, 2016

Separately manage the two types of TX rings: regular ones, and XDP.
Upon an XDP set, do not borrow regular TX rings and convert them
into XDP ones, but allocate new ones, unless we hit the max number
of rings.
Which means that in systems with smaller #cores we will not consume
the current TX rings for XDP, while we are still in the num TX limit.

XDP TX rings counters are not shown in ethtool statistics.
Instead, XDP counters will be added to the respective RX rings
in a downstream patch.

This has no performance implications.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67f8b1dc

net/mlx4_en: Add TX_XDP for CQ types · ccc109b8

由 Tariq Toukan 提交于 11月 02, 2016

Support XDP CQ type, and refactor the CQ type enum.
Rename the is_tx field to match the change.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccc109b8

12 9月, 2016 1 次提交

net/mlx4_en: Fixes for DCBX · 564ed9b1

由 Tariq Toukan 提交于 9月 11, 2016

This patch adds a capability check before enabling DCBX.
In addition, it re-organizes the relevant data structures,
and fixes a typo in a define.

Fixes: af7d5185 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

564ed9b1

07 9月, 2016 1 次提交

net/mlx4_en: protect ring->xdp_prog with rcu_read_lock · 326fe02d

由 Brenden Blanco 提交于 9月 03, 2016

Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
freed despite the use of call_rcu inside bpf_prog_put. The situation is
possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
callback for destroying the bpf prog can run even during the bh handling
in the mlx4 rx path.

Several options were considered before this patch was settled on:

Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
of the rings are updated with the new program.
This approach has the disadvantage that as the number of rings
increases, the speed of update will slow down significantly due to
napi_synchronize's msleep(1).

Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
The action of the bpf_prog_put_bh would be to then call bpf_prog_put
later. Those drivers that consume a bpf prog in a bh context (like mlx4)
would then use the bpf_prog_put_bh instead when the ring is up. This has
the problem of complexity, in maintaining proper refcnts and rcu lists,
and would likely be harder to review. In addition, this approach to
freeing must be exclusive with other frees of the bpf prog, for instance
a _bh prog must not be referenced from a prog array that is consumed by
a non-_bh prog.

The placement of rcu_read_lock in this patch is functionally the same as
putting an rcu_read_lock in napi_poll. Actually doing so could be a
potentially controversial change, but would bring the implementation in
line with sk_busy_loop (though of course the nature of those two paths
is substantially different), and would also avoid future copy/paste
problems with future supporters of XDP. Still, this patch does not take
that opinionated option.

Testing was done with kernels in either PREEMPT_RCU=y or
CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
overhead of rcu_read_lock (according to perf) was the same before/after.
In the rx path, rcu_read_lock is eventually called for every packet
from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
is easily amortized.

v2:
Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
rcu_dereference[_protected] as appropriate.
Add explicit mutex lock around rcu_assign instead of xchg loop.

Fixes: d576acf0 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

326fe02d

20 7月, 2016 4 次提交

net/mlx4_en: add xdp forwarding and data write support · 9ecc2d86

由 Brenden Blanco 提交于 7月 19, 2016

A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ecc2d86

net/mlx4_en: add page recycle to prepare rx ring for tx support · d576acf0

由 Brenden Blanco 提交于 7月 19, 2016

The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper      [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper      [kernel.vmlinux]       [k] intel_idle
  8.79%  swapper      [mlx4_en]              [k] mlx4_en_process_rx_cq
  7.54%  swapper      [kernel.vmlinux]       [k] poll_idle
  6.36%  swapper      [mlx4_core]            [k] mlx4_eq_int
  4.21%  swapper      [kernel.vmlinux]       [k] tasklet_action
  4.03%  swapper      [kernel.vmlinux]       [k] cpuidle_enter_state
  3.43%  swapper      [mlx4_en]              [k] mlx4_en_prepare_rx_desc
  2.18%  swapper      [kernel.vmlinux]       [k] native_irq_return_iret
  1.37%  swapper      [kernel.vmlinux]       [k] menu_select
  1.09%  swapper      [kernel.vmlinux]       [k] bpf_map_lookup_elem
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d576acf0

net/mlx4_en: add support for fast rx drop bpf program · 47a38e15

由 Brenden Blanco 提交于 7月 19, 2016

Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47a38e15

net/mlx4_en: Add resilience in low memory systems · ec25bc04

由 Eugenia Emantayev 提交于 7月 18, 2016

This patch fixes the lost of Ethernet port on low memory system,
when driver frees its resources and fails to allocate new resources.
Issue could happen while changing number of channels, rings size or
changing the timestamp configuration.
This fix is necessary because of removing vmap use in the code.
When vmap was in use driver could allocate non-contiguous memory
and make it contiguous with vmap. Now it could fail to allocate
a large chunk of contiguous memory and lose the port.
Current code tries to allocate new resources and then upon success
frees the old resources.

Fixes: 73898db0 ('net/mlx4: Avoid wrong virtual mappings')
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec25bc04

24 6月, 2016 1 次提交

net/mlx4_en: Add DCB PFC support through CEE netlink commands · af7d5185

由 Rana Shahout 提交于 6月 21, 2016

This patch adds support for reading and updating priority flow
control (PFC) attributes in the driver via netlink.
Signed-off-by: NRana Shahout <ranas@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af7d5185

18 6月, 2016 1 次提交

mlx4_en: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port · a831274a

由 Alexander Duyck 提交于 6月 16, 2016

This change replaces the network device operations for adding or removing a
VXLAN port with operations that are more generically defined to be used for
any UDP offload port but provide a type.  As such by just adding a line to
verify that the offload type is VXLAN we can maintain the same
functionality.

In addition I updated the socket address family check so that instead of
excluding IPv6 we instead abort of type is not IPv4.  This makes much more
sense as we should only be supporting IPv4 outer addresses on this
hardware.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a831274a

26 5月, 2016 3 次提交

net/mlx4_en: get rid of private net_device_stats · f73a6f43

由 Eric Dumazet 提交于 5月 25, 2016

We simply can use the standard net_device stats.

We do not need to clear fields that are already 0.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f73a6f43

net/mlx4_en: get rid of ret_stats · 9ed17db1

由 Eric Dumazet 提交于 5月 25, 2016

mlx4 uses a private struct net_device_stats in a vain attempt
to avoid races.

This is buggy because multiple cpus could call mlx4_en_get_stats()
at the same time, so ret_stats can not guarantee stable results.

To fix this, we need to switch to ndo_get_stats64() as this
method provides per-thread storage.

This allows to reduce mlx4_en_priv bloat.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ed17db1

net/mlx4_en: fix tx_dropped bug · 63a664b7

由 Eric Dumazet 提交于 5月 25, 2016

1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable
is overwritten in mlx4_en_DUMP_ETH_STATS().

2) This increment was not SMP safe, as a port might have many TX queues.

Add a per TX ring tx_dropped to fix these issues.

This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field.

So lets avoid bugs with SNMP agents having to cope with partial
overwraps. (One of these agents being bond_fold_stats())
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NWillem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63a664b7

06 5月, 2016 1 次提交

net/mlx4: Avoid wrong virtual mappings · 73898db0

由 Haggai Abramovsky 提交于 5月 04, 2016

The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory.  On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent.  Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent().  Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).

The mlx4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
device is opened.

Prevent Ethernet driver to run this problematic code by forcing it to
allocate contiguous memory. As for the Infiniband driver, at first we
are trying to allocate contiguous memory, but in case of failure roll
back to work with fragmented memory.
Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reported-by: NDavid Daney <david.daney@cavium.com>
Tested-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73898db0

22 4月, 2016 1 次提交

net/mlx4_en: Split SW RX dropped counter per RX ring · d21ed3a3

由 Eran Ben Elisha 提交于 4月 20, 2016

Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.

In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.

Fixes: a3333b35 ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reported-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d21ed3a3

26 2月, 2016 1 次提交

net: mlx4: use new ETHTOOL_G/SSETTINGS API · 3d8f7cc7

由 David Decotigny 提交于 2月 24, 2016

Signed-off-by: NDavid Decotigny <decot@googlers.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d8f7cc7

19 11月, 2015 2 次提交

mlx4: remove mlx4_en_low_latency_recv() · 868fdb06

由 Eric Dumazet 提交于 11月 18, 2015

Busy polling can now be handled in generic NAPI poll infrastructure.
This removes complexity and fast path overhead :

mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()

Tested:

Without busy polling :

lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    47330.78

With busy polling :

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97643.55
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

868fdb06

mlx4: mlx4_en_low_latency_recv() called with BH disabled · 5865316c

由 Eric Dumazet 提交于 11月 18, 2015

mlx4_en_low_latency_recv() is called with BH disabled,
as other ndo_busy_poll() methods.

No need for spin_lock_bh()/spin_unlock_bh()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5865316c

22 10月, 2015 1 次提交

net/mlx4_en: Implement mcast loopback prevention for ETH qps · 74194fb9

由 Maor Gottlieb 提交于 10月 15, 2015

Set the mcast loopback prevention bit in the QPC for ETH MLX QPs (not
RSS QPs), when the firmware supports this feature. In addition, all rx
ring QPs need to be updated in order not to enforce loopback checks.
This prevents getting packets we sent both from the network stack and
the HCA. Loopback prevention is done by comparing the counter indices of
the sent and receiving QPs. If they're equal, packets aren't
loopback-ed.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

74194fb9

28 7月, 2015 1 次提交

net/mlx4_en: Add support for hardware accelerated 802.1ad vlan · e38af4fa

由 Hadar Hen Zion 提交于 7月 27, 2015

To enable device support in accelerated 802.1ad vlan, the port
capability "packet has vlan enable" (phv_en) should be set.
Firmware won't work properly, in case phv_en is not set.

The user can enable "phv_en" port capability with the new ethtool
private flag phv-bit. The phv-bit private flag default value is OFF,
users who are interested in 802.1ad hardware acceleration should turn ON
the phv-bit private flag:
$ ethtool --set-priv-flags eth1 phv-bit on

Once the private flag is set, the device is ready for 802.1ad vlan
acceleration.

The user should also change the interface device features and turn on
"tx-vlan-stag-hw-insert" which is off by default:
$ ethtool -K eth1  tx-vlan-stag-hw-insert on

"phv-bit" private flag setting is available only for Physical
Functions(PF), the Virtual Function (VF) will be able to use the feature
by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
feature was enabled by the Hypervisor.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e38af4fa

25 6月, 2015 2 次提交

net/mlx4_en: Wake TX queues only when there's enough room · 488a9b48

由 Ido Shamay 提交于 6月 25, 2015

Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.

Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

488a9b48

net/mlx4_en: Release TX QP when destroying TX ring · 0eb08514

由 Eran Ben Elisha 提交于 6月 25, 2015

TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.

Fixes: ddae0349 ('net/mlx4: Change QP allocation scheme')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eb08514

16 6月, 2015 2 次提交

net/mlx4_en: Show PF own statistics via ethtool · b42de4d0

由 Eran Ben Elisha 提交于 6月 15, 2015

Allow the user to observe the PF own statistics using ethtool with pf_
prefixed counter names.

Those counters are the PF statistics out of the overall port statistics.
Every PF QP is attached to a counter and the summary of those counters
is the PF statistics.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b42de4d0

net/mlx4_core: Allocate default counter per port · 6de5f7f6

由 Eran Ben Elisha 提交于 6月 15, 2015

Default counter per port will be allocated at the mlx4 core driver load.

Every QP opened by the Ethernet driver will be attached to the port's default
counter. This is an infrastructure step to collect VF statistics from the PF.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6de5f7f6

31 5月, 2015 1 次提交

net/mlx4: Add EQ pool · c66fa19c

由 Matan Barak 提交于 5月 31, 2015

Previously, mlx4_en allocated EQs and used them exclusively.
This affected RoCE performance, as applications which are
events sensitive were limited to use only the legacy EQs.

Change that by introducing an EQ pool. This pool is managed
by mlx4_core. EQs are assigned to ports (when there are limited
number of EQs, multiple ports could be assigned to the same EQs).

An exception to this rule is the ASYNC EQ which handles various events.

Legacy EQs are completely removed as all EQs could be shared.

When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
EQ serving on a specific port. The core driver calculates which
EQ should be assigned to that request.

Because IRQs are shared between IB and Ethernet modules, their
names only include the PCI device BDF address.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c66fa19c

01 5月, 2015 1 次提交

net/mlx4_en: Schedule napi when RX buffers allocation fails · 07841f9d

由 Ido Shamay 提交于 4月 30, 2015

When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07841f9d

03 4月, 2015 1 次提交

net/mlx4_en: Add interface identify support · 51af33cf

由 Ido Shamay 提交于 4月 02, 2015

Add support for the interface ethtool identify feature.

Make the physical port LED to blink with green and yellow colors.

The device handles the LED blink by itself (synchrous use of
set_phys_id), by returning 0 to ETHTOOL_ID_ACTIVE command.
Signed-off-by: NEyal Grossman <eyalgr@mellanox.com>
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51af33cf

01 4月, 2015 5 次提交

net/mlx4_en: Add Flow control statistics display via ethtool · 0b131561

由 Matan Barak 提交于 3月 30, 2015

Flow control per priority and Global pause counters are now visible via
ethtool.  The counters shows statistics regarding pauses in the device.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b131561

net/mlx4_en: Protect access to the statistics bitmap · 3da8a36c

由 Eran Ben Elisha 提交于 3月 30, 2015

This will allow parallel access to the statistics bitmap.
A pre-step for adding PFC counters, where the statistics bitmap
can be dynamically changed when modifying the PFC setting.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3da8a36c

net/mlx4_en: Support general selective view of ethtool statistics · 6fcd2735

由 Eran Ben Elisha 提交于 3月 30, 2015

The driver uses a bitmask to indicate which statistics should be
displayed to the user in ethtool. The bitmask is u64, therefore we are
limited for a selective view of up to 64 statistics. Extend the bitmap
in order to show more than 64 statistics.

In addition, add packet statistics to the ethtool display for PF.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fcd2735

net/mlx4_en: Move statistics bitmap setting to the Ethernet driver · ffa88f37

由 Eran Ben Elisha 提交于 3月 30, 2015

The statistics bitmap belongs to the Ethernet driver, move it there.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffa88f37

net/mlx4_en: Create new header file for all statistics info · b4b6e842

由 Eran Ben Elisha 提交于 3月 30, 2015

Add mlx4_stats.h file and move there all statistics structs and marcos.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4b6e842

19 3月, 2015 1 次提交

net/mlx4_en: Fix off-by-one in ethtool statistics display · a16f3565

由 Eran Ben Elisha 提交于 3月 18, 2015

NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when
displaying the statistics starting from tx_chksum_offload in ethtool.

Fixes: f8c6455b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a16f3565

07 3月, 2015 1 次提交

net/mlx4_en: Add QCN parameters and statistics handling · 708b869b

由 Shani Michaeli 提交于 3月 05, 2015

Implement the IEEE DCB handlers for set/get QCN parameters and
statistics reading per TC.
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

708b869b

05 2月, 2015 1 次提交

net/mlx4_en: Port aggregation configuration · 5da03547

由 Moni Shoua 提交于 2月 03, 2015

Capture NETDEV events generated by the bonding driver and based on that
make decisions of how to configure port aggregation in the mlx4 core driver.

This includes setting the V2P port table and re-creating the interested
interfaces in bonded/non-bonded mode.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5da03547

12 12月, 2014 1 次提交

net/mlx4: Change QP allocation scheme · ddae0349

由 Eugenia Emantayev 提交于 12月 11, 2014

When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddae0349

09 12月, 2014 1 次提交

net/mlx4_en: Support for configurable RSS hash function · 947cbb0a

由 Eyal Perry 提交于 12月 02, 2014

The ConnectX HW is capable of using one of the following hash functions:
Toeplitz and an XOR hash function. This patch extends the implementation
of the mlx4_en driver set/get_rxfh callbacks to support getting and
setting the RSS hash function used by the device.
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

947cbb0a

24 11月, 2014 1 次提交

mlx4: fix mlx4_en_set_rxfh() · bd635c35

由 Eric Dumazet 提交于 11月 22, 2014

mlx4_en_set_rxfh() can crash if no RSS indir table is provided.

While we are at it, allow RSS key to be changed with ethtool -X

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30:ab:8a:6e:c3:6a

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e

myhost:~# ethtool -X eth0 hkey \
03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f
Reported-by: NBen Hutchings <ben@decadent.org.uk>
Fixes: b9d1ab7e ("mlx4: use netdev_rss_key_fill() helper")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd635c35

12 11月, 2014 1 次提交

net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE · f8c6455b

由 Shani Michaeli 提交于 11月 09, 2014

When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).

Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.

In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8c6455b

04 11月, 2014 1 次提交

net/mlx4_en: Remove RX buffers alignment to IP_ALIGN · 5f6e9800

由 Ido Shamay 提交于 11月 02, 2014

When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f6e9800

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功