提交 · 8d1666fdfcea94abc0c2f84b0aa512269d28bdf1 · openeuler / raspberrypi-kernel

29 12月, 2017 1 次提交

net/mlx4_en: Change default QoS settings · a42b63c1

由 Moni Shoua 提交于 12月 28, 2017

Change the default mapping between TC and TCG as follows:

Prio     |             TC/TCG
         |      from             to
         |    (set by FW)      (set by SW)
---------+-----------------------------------
0        |      0/0              0/7
1        |      1/0              0/6
2        |      2/0              0/5
3        |      3/0              0/4
4        |      4/0              0/3
5        |      5/0              0/2
6        |      6/0              0/1
7        |      7/0              0/0

These new settings cause that a pause frame for any prio stops
traffic for all prios.

Fixes: 564c274c ("net/mlx4_en: DCB QoS support")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a42b63c1

08 11月, 2017 1 次提交

net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO · 575ed7d3

由 Nogah Frankel 提交于 11月 06, 2017

Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

575ed7d3

05 11月, 2017 1 次提交

net: bpf: rename ndo_xdp to ndo_bpf · f4e63525

由 Jakub Kicinski 提交于 11月 03, 2017

ndo_xdp is a control path callback for setting up XDP in the
driver.  We can reuse it for other forms of communication
between the eBPF stack and the drivers.  Rename the callback
and associated structures and definitions.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4e63525

12 10月, 2017 1 次提交

net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion · f025fd60

由 Tariq Toukan 提交于 10月 11, 2017

In XDP_TX, some fields in tx_info and tx_desc are constants across
all entries of the different XDP_TX rings.
Assign values to these fields on ring creation time, rather than in
data-path.

Patchset performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
------------------------------
Before    | After     | Gain |
13.7 Mpps | 14.0 Mpps | %2.2 |
------------------------------
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f025fd60

11 10月, 2017 1 次提交

net/mlx4_en: Limit the number of TX rings · 7e1dc5e9

由 Inbar Karmy 提交于 10月 10, 2017

Limit the number of TX rings per UP by the number of cores
in the system.
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e1dc5e9

30 8月, 2017 1 次提交

net/mlx4: Add user mac FW update support · be599603

由 Moshe Shemesh 提交于 8月 28, 2017

Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be599603

08 8月, 2017 4 次提交

net: sched: get rid of struct tc_to_netdev · de4784ca

由 Jiri Pirko 提交于 8月 07, 2017

Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de4784ca

net: sched: change return value of ndo_setup_tc for driver supporting mqprio only · 38cf0426

由 Jiri Pirko 提交于 8月 07, 2017

Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38cf0426

net: sched: push cls related args into cls_common structure · 5fd9fc4e

由 Jiri Pirko 提交于 8月 07, 2017

As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fd9fc4e

net: sched: make type an argument for ndo_setup_tc · 2572ac53

由 Jiri Pirko 提交于 8月 07, 2017

Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2572ac53

24 7月, 2017 1 次提交

(IB, net)/mlx4: Add resource utilization support · f3301870

由 Moshe Shemesh 提交于 6月 21, 2017

Adding visibility of resource usage of QPs, CQs and counters used by
virtual functions. This feature will be used to give the PF administrator
more data while debugging VF status. Usage info was added to ALLOC_RES
command, to notify the PF if the resource which is being reserved or
allocated for the VF will be used by kernel driver or by user verbs.

Updated reservation and allocation functions of QP, CQ and counter with
additional usage parameter.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f3301870

30 6月, 2017 2 次提交

net/mlx4_en: Do not allocate redundant TX queues when TC is disabled · ec327f7a

由 Inbar Karmy 提交于 6月 29, 2017

Currently the number of TX queues that are allocated doesn't depend
on the number of TCs, the module always loads with max num of UP
per channel.
In order to prevent the allocation of unnecessary memory, the
module will load with minimum number of UPs per channel, and the
user will be able to control the number of TX queues per channel
by changing the number of TC to 8 using the tc command.
The variable num_up will hold the information about the current
number of UPs.
Due to the change, needed to remove the lines that set the value of
UP to be different than zero in the func "mlx4_en_select_queue",
since now the num of TX queues that are allocated is only one per channel
in default.
In order not to force the UP to be zero in case of only one TC, added
a condition before forcing it in the func "mlx4_en_fill_qp_context".

Tested:
After the module is loaded with minimum number of UP per channel, to
increase num of TCs to 8, use:
tc qdisc add dev ens8 root mqprio num_tc 8
In order to decrease the number of TCs to minimum number of UP per channel,
use:
tc qdisc del dev ens8 root
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec327f7a

net/mlx4_en: Add dynamic variable to hold the number of user priorities (UP) · f21ad614

由 Inbar Karmy 提交于 6月 29, 2017

Until this patch, the number of UPs was hard coded for eight.
Replace this with a variable in struct "mlx4_en_port_profile".
Currently, the variable will hold the maximum number of UP,
as before.
The patch creates an infrastructure to add an option for dynamic
change of the actual number of TCs.
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f21ad614

27 6月, 2017 1 次提交

net/mlx4: fix spelling mistake: "coalesing" -> "coalescing" · 593814d1

由 Colin Ian King 提交于 6月 26, 2017

Trivial fix to spelling mistake in en_dbg debug message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

593814d1

16 6月, 2017 3 次提交

bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROG · 2e37e9b0

由 Martin KaFai Lau 提交于 6月 15, 2017

Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e37e9b0

net/mlx4_en: Poll XDP TX completion queue in RX NAPI · 6c78511b

由 Tariq Toukan 提交于 6月 15, 2017

Instead of having their own NAPIs, XDP TX completion queues get
polled within the corresponding RX NAPI.
This prevents any possible race on TX ring prod/cons indices,
between the context that issues the transmits (RX NAPI) and the
context that handles the completions (was previously done in
a separate NAPI).

This also improves performance, as it decreases the number
of NAPIs running on a CPU, saving the overhead of syncing
and switching between the contexts.

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 12.0 Mpps | 13.8 Mpps |  15% |
IPv6 | 12.0 Mpps | 13.8 Mpps |  15% |
-------------------------------------
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c78511b

net/mlx4_en: Optimized single ring steering · 4931c6ef

由 Saeed Mahameed 提交于 6月 15, 2017

Avoid touching RX QP RSS context when loading with only
one RX ring, to allow optimized A0 RX steering.

Enable by:
- loading mlx4_core with module param: log_num_mgm_entry_size = -6.
- then: ethtool -L <interface> rx 1

Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_DROP packet rate:
-------------------------------------
     | Before    | After     | Gain |
IPv4 | 20.5 Mpps | 28.1 Mpps |  37% |
IPv6 | 18.4 Mpps | 28.1 Mpps |  53% |
-------------------------------------
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4931c6ef

08 6月, 2017 1 次提交

net: propagate tc filter chain index down the ndo_setup_tc call · a5fcf8a6

由 Jiri Pirko 提交于 6月 06, 2017

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5fcf8a6

22 5月, 2017 1 次提交

net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL · e3412575

由 Miroslav Lichvar 提交于 5月 19, 2017

Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3412575

16 3月, 2017 1 次提交

mqprio: Modify mqprio to pass user parameters via ndo_setup_tc. · 56f36acd

由 Amritha Nambiar 提交于 3月 15, 2017

The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.

This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.

Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56f36acd

23 2月, 2017 1 次提交

net/mlx4: Spoofcheck and zero MAC can't coexist · 745d8ae4

由 Eugenia Emantayev 提交于 2月 23, 2017

Spoofcheck can't be enabled if VF MAC is zero.
Vice versa, can't zero MAC if spoofcheck is on.

Fixes: 8f7ba3ca ('net/mlx4: Add set VF mac address support')
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745d8ae4

20 2月, 2017 1 次提交

mlx4: fix potential divide by 0 in mlx4_en_auto_moderation() · f5a57723

由 Eric Dumazet 提交于 2月 16, 2017

1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high,
mlx4_en_auto_moderation() does a divide by zero.

2) We want to properly change the moderation parameters if rx_frames
was changed (like in ethtool -C eth0 rx-frames 16)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a57723

03 2月, 2017 2 次提交

mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G' · 770f8225

由 Martin KaFai Lau 提交于 1月 31, 2017

After calling mlx4_en_try_alloc_resources (e.g. by changing the
number of rx-queues with ethtool -L), the existing xdp_prog becomes
inactive.

The bug is that the xdp_prog ptr has not been carried over from
the old rx-queues to the new rx-queues

Fixes: 47a38e15 ("net/mlx4_en: add support for fast rx drop bpf program")
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

770f8225

mlx4: Fix memory leak after mlx4_en_update_priv() · f32b20e8

由 Martin KaFai Lau 提交于 1月 31, 2017

In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
are over-written by src->tx_ring[t] and src->tx_cq[t] without
first calling kfree.

One of the reproducible code paths is by doing 'ethtool -L'.

The fix is to do the kfree in mlx4_en_free_resources().

Here is the kmemleak report:
unreferenced object 0xffff880841211800 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff880841213000 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

(gdb) list *mlx4_en_try_alloc_resources+0x118
0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
2140                    if (!dst->tx_ring_num[t])
2141                            continue;
2142
2143                    dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
2144                                              MAX_TX_RINGS, GFP_KERNEL);
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
(gdb) list *mlx4_en_try_alloc_resources+0x13b
0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
2150                    if (!dst->tx_cq[t]) {
2151                            kfree(dst->tx_ring[t]);
2152                            goto err_free_tx;
2153                    }
2154            }

Fixes: ec25bc04 ("net/mlx4_en: Add resilience in low memory systems")
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f32b20e8

31 1月, 2017 1 次提交

net/mlx4_en: Pass user MTU value to Firmware at set port command · 40fb4fc1

由 Shaker Daibes 提交于 1月 29, 2017

When starting the port, driver will inform Firmware about the actual MTU
which does not include implicit headers, such as FCS or VLAN tags.
Signed-off-by: NShaker Daibes <shakerd@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40fb4fc1

17 1月, 2017 1 次提交

mlx4: do not call napi_schedule() without care · 8cf699ec

由 Eric Dumazet 提交于 1月 13, 2017

Disable BH around the call to napi_schedule() to avoid following warning

[   52.095499] NOHZ: local_softirq_pending 08
[   52.421291] NOHZ: local_softirq_pending 08
[   52.608313] NOHZ: local_softirq_pending 08

Fixes: 8d59de8f ("net/mlx4_en: Process all completions in RX rings after port goes up")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cf699ec

11 1月, 2017 1 次提交

mlx4: Return EOPNOTSUPP instead of ENOTSUPP · 9f9b74ef

由 Martin KaFai Lau 提交于 1月 10, 2017

In commit b45f0674 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"),
it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it.

Fixes: b45f0674 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9b74ef

09 1月, 2017 1 次提交

net: make ndo_get_stats64 a void function · bc1f4470

由 stephen hemminger 提交于 1月 06, 2017

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc1f4470

24 12月, 2016 1 次提交

net/mlx4_en: Fix user prio field in XDP forward · eb9def61

由 Tariq Toukan 提交于 12月 22, 2016

The user prio field is wrong (and overflows) in the XDP forward
flow.
This is a result of a bad value for num_tx_rings_p_up, which should
account all XDP TX rings, as they operate for the same user prio.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reported-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb9def61

09 12月, 2016 3 次提交

mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active · ea3349a0

由 Martin KaFai Lau 提交于 12月 07, 2016

Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head()
support.  This patch only affects the code path when XDP is active.

After testing, the tx_dropped counter is incremented if the xdp_prog sends
more than wire MTU.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea3349a0

mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs · b45f0674

由 Martin KaFai Lau 提交于 12月 07, 2016

When XDP is active in mlx4, mlx4 is using one page/pkt.
At the same time (i.e. when XDP is active), it is currently
limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
which is 1514 in x86.  AFAICT, we can at least raise the MTU
limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
patch is doing.  It will be useful in the next patch which
allows XDP program to extend the packet by adding new header(s).

Note: In the earlier XDP patches, there is already existing guard
to ensure the page/pkt scheme only applies when XDP is active
in mlx4.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b45f0674

bpf: xdp: Allow head adjustment in XDP prog · 17bedab2

由 Martin KaFai Lau 提交于 12月 07, 2016

This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17bedab2

03 12月, 2016 1 次提交

mlx4: fix use-after-free in mlx4_en_fold_software_stats() · 7f7bf160

由 Eric Dumazet 提交于 12月 01, 2016

My recent commit to get more precise rx/tx counters in ndo_get_stats64()
can lead to crashes at device dismantle, as Jesper found out.

We must prevent mlx4_en_fold_software_stats() trying to access
tx/rx rings if they are deleted.

Fix this by adding a test against priv->port_up in
mlx4_en_fold_software_stats()

Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
allows us to eventually broadcast the latest/current counters to
rtnetlink monitors.

Fixes: 40931b85 ("mlx4: give precise rx/tx bytes/packets counters")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-and-bisected-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f7bf160

30 11月, 2016 1 次提交

mlx4: give precise rx/tx bytes/packets counters · 40931b85

由 Eric Dumazet 提交于 11月 25, 2016

mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.

Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
Average:         eth0 142827.50 3179259.70   9206.30 4700578.16

This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.

We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
Average:         eth0 142835.66 3182165.93   9206.98 4704874.08
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40931b85

29 11月, 2016 1 次提交

Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow" · b4353708

由 Tariq Toukan 提交于 11月 27, 2016

This reverts commit 9d769311.

Using unregister_netdev at shutdown flow prevents calling
the netdev's ndos or trying to access its freed resources.

This fixes crashes like the following:
 Call Trace:
  [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
  [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
  [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
  [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
  [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
  [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
  [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
  [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
  [<ffffffff815a6205>] linkwatch_event+0x25/0x30
  [<ffffffff81099a82>] process_one_work+0x152/0x400
  [<ffffffff8109a325>] worker_thread+0x125/0x4b0
  [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
  [<ffffffff8109fc6a>] kthread+0xca/0xe0
  [<ffffffff8109fba0>] ? kthread_park+0x60/0x60
  [<ffffffff816a1285>] ret_from_fork+0x25/0x30

Fixes: 9d769311 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4353708

28 11月, 2016 1 次提交

mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation() · b9972d22

由 Eric Dumazet 提交于 11月 23, 2016

Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.

Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.

Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9972d22

25 11月, 2016 1 次提交

mlx4: reorganize struct mlx4_en_tx_ring · e3f42f84

由 Eric Dumazet 提交于 11月 22, 2016

Goal is to reorganize this critical structure to increase performance.

ndo_start_xmit() should only dirty one cache line, and access as few
cache lines as possible.

Add sp_ (Slow Path) prefix to fields that are not used in fast path,
to make clear what is going on.

After this patch pahole reports something much better, as all
ndo_start_xmit() needed fields are packed into two cache lines instead
of seven or eight

struct mlx4_en_tx_ring {
	u32                        last_nr_txbb;         /*     0   0x4 */
	u32                        cons;                 /*   0x4   0x4 */
	long unsigned int          wake_queue;           /*   0x8   0x8 */
	struct netdev_queue *      tx_queue;             /*  0x10   0x8 */
	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /*  0x18   0x8 */
	struct mlx4_en_rx_ring *   recycle_ring;         /*  0x20   0x8 */

	/* XXX 24 bytes hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        prod;                 /*  0x40   0x4 */
	unsigned int               tx_dropped;           /*  0x44   0x4 */
	long unsigned int          bytes;                /*  0x48   0x8 */
	long unsigned int          packets;              /*  0x50   0x8 */
	long unsigned int          tx_csum;              /*  0x58   0x8 */
	long unsigned int          tso_packets;          /*  0x60   0x8 */
	long unsigned int          xmit_more;            /*  0x68   0x8 */
	struct mlx4_bf             bf;                   /*  0x70  0x18 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	__be32                     doorbell_qpn;         /*  0x88   0x4 */
	__be32                     mr_key;               /*  0x8c   0x4 */
	u32                        size;                 /*  0x90   0x4 */
	u32                        size_mask;            /*  0x94   0x4 */
	u32                        full_size;            /*  0x98   0x4 */
	u32                        buf_size;             /*  0x9c   0x4 */
	void *                     buf;                  /*  0xa0   0x8 */
	struct mlx4_en_tx_info *   tx_info;              /*  0xa8   0x8 */
	int                        qpn;                  /*  0xb0   0x4 */
	u8                         queue_index;          /*  0xb4   0x1 */
	bool                       bf_enabled;           /*  0xb5   0x1 */
	bool                       bf_alloced;           /*  0xb6   0x1 */
	u8                         hwtstamp_tx_type;     /*  0xb7   0x1 */
	u8 *                       bounce_buf;           /*  0xb8   0x8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	long unsigned int          queue_stopped;        /*  0xc0   0x8 */
	struct mlx4_hwq_resources  sp_wqres;             /*  0xc8  0x58 */
	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
	struct mlx4_qp             sp_qp;                /* 0x120  0x30 */
	/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
	struct mlx4_qp_context     sp_context;           /* 0x150  0xf8 */
	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
	cpumask_t                  sp_affinity_mask;     /* 0x248  0x20 */
	enum mlx4_qp_state         sp_qp_state;          /* 0x268   0x4 */
	u16                        sp_stride;            /* 0x26c   0x2 */
	u16                        sp_cqn;               /* 0x26e   0x2 */

	/* size: 640, cachelines: 10, members: 36 */
	/* sum members: 600, holes: 1, sum holes: 24 */
	/* padding: 16 */
};

Instead of this silly placement :

struct mlx4_en_tx_ring {
	u32                        last_nr_txbb;         /*     0   0x4 */
	u32                        cons;                 /*   0x4   0x4 */
	long unsigned int          wake_queue;           /*   0x8   0x8 */

	/* XXX 48 bytes hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        prod;                 /*  0x40   0x4 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          bytes;                /*  0x48   0x8 */
	long unsigned int          packets;              /*  0x50   0x8 */
	long unsigned int          tx_csum;              /*  0x58   0x8 */
	long unsigned int          tso_packets;          /*  0x60   0x8 */
	long unsigned int          xmit_more;            /*  0x68   0x8 */
	unsigned int               tx_dropped;           /*  0x70   0x4 */

	/* XXX 4 bytes hole, try to pack */

	struct mlx4_bf             bf;                   /*  0x78  0x18 */
	/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
	long unsigned int          queue_stopped;        /*  0x90   0x8 */
	cpumask_t                  affinity_mask;        /*  0x98  0x10 */
	struct mlx4_qp             qp;                   /*  0xa8  0x30 */
	/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
	struct mlx4_hwq_resources  wqres;                /*  0xd8  0x58 */
	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
	u32                        size;                 /* 0x130   0x4 */
	u32                        size_mask;            /* 0x134   0x4 */
	u16                        stride;               /* 0x138   0x2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        full_size;            /* 0x13c   0x4 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	u16                        cqn;                  /* 0x140   0x2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        buf_size;             /* 0x144   0x4 */
	__be32                     doorbell_qpn;         /* 0x148   0x4 */
	__be32                     mr_key;               /* 0x14c   0x4 */
	void *                     buf;                  /* 0x150   0x8 */
	struct mlx4_en_tx_info *   tx_info;              /* 0x158   0x8 */
	struct mlx4_en_rx_ring *   recycle_ring;         /* 0x160   0x8 */
	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168   0x8 */
	u8 *                       bounce_buf;           /* 0x170   0x8 */
	struct mlx4_qp_context     context;              /* 0x178  0xf8 */
	/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
	int                        qpn;                  /* 0x270   0x4 */
	enum mlx4_qp_state         qp_state;             /* 0x274   0x4 */
	u8                         queue_index;          /* 0x278   0x1 */
	bool                       bf_enabled;           /* 0x279   0x1 */
	bool                       bf_alloced;           /* 0x27a   0x1 */

	/* XXX 5 bytes hole, try to pack */

	/* --- cacheline 10 boundary (640 bytes) --- */
	struct netdev_queue *      tx_queue;             /* 0x280   0x8 */
	int                        hwtstamp_tx_type;     /* 0x288   0x4 */

	/* size: 704, cachelines: 11, members: 36 */
	/* sum members: 587, holes: 6, sum holes: 65 */
	/* padding: 52 */
};
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3f42f84

24 11月, 2016 1 次提交

net/mlx4_en: Free netdev resources under state lock · b6e01232

由 Tariq Toukan 提交于 11月 22, 2016

Make sure mlx4_en_free_resources is called under the netdev state lock.
This is needed since RCU dereference of XDP prog should be protected.

Fixes: 326fe02d ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reported-by: NSagi Grimberg <sagi@grimberg.me>
CC: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6e01232

13 11月, 2016 1 次提交

bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path · c540594f

由 Daniel Borkmann 提交于 11月 09, 2016

Commit 67f8b1dc ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().

We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()

Fixes: 67f8b1dc ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c540594f

10 11月, 2016 1 次提交

Revert "net/mlx4_en: Fix panic during reboot" · f91d7181

由 Tariq Toukan 提交于 11月 06, 2016

This reverts commit 9d2afba0.

The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.

The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f91d7181