提交 · 9a60c9072295b30459284beca9aff52be8dfd64b · openanolis / cloud-kernel

18 12月, 2016 1 次提交

mlxsw: spectrum: Mark split ports as such · 9a60c907

由 Ido Schimmel 提交于 12月 16, 2016

When a port is split we should mark it as such, as otherwise the split
ports aren't renamed correctly (e.g. sw1p3 -> sw1p3s1) and the unsplit
operation fails:

$ devlink port split sw1p3 count 4
$ devlink port unsplit eth0
devlink answers: Invalid argument
[  598.565307] mlxsw_spectrum 0000:03:00.0 eth0: Port wasn't split

Fixes: 67963a33 ("mlxsw: Make devlink port instances independent of spectrum/switchx2 port instances")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NTamir Winetroub <tamirw@mellanox.com>
Reviewed-by: NElad Raz <eladr@mellanox.com>
Tested-by: NTamir Winetroub <tamirw@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a60c907

10 12月, 2016 1 次提交

net: mlx5: Fix Kconfig help text · d33695fb

由 Christopher Covington 提交于 12月 09, 2016

Since the following commit, Infiniband and Ethernet have not been
mutually exclusive.

Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet
Signed-off-by: NChristopher Covington <cov@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d33695fb

09 12月, 2016 5 次提交

net/mlx5e: use %pad format string for dma_addr_t · 9afd8952

由 Arnd Bergmann 提交于 12月 08, 2016

On 32-bit ARM with 64-bit dma_addr_t I get this warning about an
incorrect format string:

In file included from /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0:
drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function ‘mlx5_frag_buf_alloc_node’:
drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]

We have the special %pad format for printing dma_addr_t, so use that
to print the correct address and avoid the warning.

Fixes: 1c1b5228 ("net/mlx5e: Implement Fragmented Work Queue (WQ)")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9afd8952

mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active · ea3349a0

由 Martin KaFai Lau 提交于 12月 07, 2016

Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head()
support.  This patch only affects the code path when XDP is active.

After testing, the tx_dropped counter is incremented if the xdp_prog sends
more than wire MTU.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea3349a0

mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs · b45f0674

由 Martin KaFai Lau 提交于 12月 07, 2016

When XDP is active in mlx4, mlx4 is using one page/pkt.
At the same time (i.e. when XDP is active), it is currently
limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
which is 1514 in x86.  AFAICT, we can at least raise the MTU
limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
patch is doing.  It will be useful in the next patch which
allows XDP program to extend the packet by adding new header(s).

Note: In the earlier XDP patches, there is already existing guard
to ensure the page/pkt scheme only applies when XDP is active
in mlx4.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b45f0674

bpf: xdp: Allow head adjustment in XDP prog · 17bedab2

由 Martin KaFai Lau 提交于 12月 07, 2016

This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17bedab2

net/mlx5e: Offload TC matching on packets being IP fragments · 3f7d0eb4

由 Or Gerlitz 提交于 12月 07, 2016

Enable offloading of matching on packets being fragments.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f7d0eb4

07 12月, 2016 6 次提交

net/mlx5e: Change the SQ/RQ operational state to positive logic · c0f1147d

由 Mohamad Haj Yahia 提交于 12月 06, 2016

When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen
we will have a time interval that the RQ/SQ is not really ready and the
state indicates that its not in FLUSH state because the initial SQ/RQ struct
memory starts as zeros.
Now we changed the state to indicate if the SQ/RQ is opened and we will
set the READY state after finishing preparing all the SQ/RQ resources.

Fixes: 6e8dd6d6 ("net/mlx5e: Don't wait for SQ completions on close")
Fixes: f2fde18c ("net/mlx5e: Don't wait for RQ completions on close")
Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0f1147d

net/mlx5e: Don't flush SQ on error · 3c8591d5

由 Saeed Mahameed 提交于 12月 06, 2016

We are doing SQ descriptors cleanup in driver.

Fixes: 6e8dd6d6 ("net/mlx5e: Don't wait for SQ completions on close")
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c8591d5

net/mlx5e: Don't notify HW when filling the edge of ICO SQ · b8335d91

由 Saeed Mahameed 提交于 12月 06, 2016

We are going to do this a couple of steps ahead anyway.

Fixes: d3c9bc27 ("net/mlx5e: Added ICO SQs")
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8335d91

net/mlx5: Fix query ISSI flow · f9c14e46

由 Kamal Heib 提交于 12月 06, 2016

In old FWs query ISSI command is not supported and for some of those FWs
it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".

In such case instead of failing the driver load, we will treat any FW
status other than 0 for Query ISSI FW command as ISSI not supported and
assume ISSI=0 (most basic driver/FW interface).

In case of driver syndrom (query ISSI failure by driver) we will fail
driver load.

Fixes: f62b8bb8 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9c14e46

net/mlx5: Remove duplicate pci dev name print · 9e5b2fc1

由 Kamal Heib 提交于 12月 06, 2016

Remove duplicate pci dev name printing from mlx5_core_warn/dbg.

Fixes: 5a788398 ('net/mlx5_core: Improve mlx5 messages')
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e5b2fc1

net/mlx5: Verify module parameters · f663ad98

由 Kamal Heib 提交于 12月 06, 2016

Verify the mlx5_core module parameters by making sure that they are in
the expected range and if they aren't restore them to their default
values.

Fixes: 9603b61d ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f663ad98

04 12月, 2016 4 次提交

ipv4: fib: Replay events when registering FIB notifier · c3852ef7

由 Ido Schimmel 提交于 12月 03, 2016

Commit b90eb754 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.

However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.

Solve that by dumping the FIB tables and replaying the events to the
passed notification block. The dump itself is done using RCU in order
not to starve consumers that need RTNL to make progress.

The integrity of the dump is ensured by reading the FIB change sequence
counter before and after the dump under RTNL. This allows us to avoid
the problematic situation in which the dumping process sends a ENTRY_ADD
notification following ENTRY_DEL generated by another process holding
RTNL.

Callers of the registration function may pass a callback that is
executed in case the dump was inconsistent with current FIB tables.

The number of retries until a consistent dump is achieved is set to a
fixed number to prevent callers from looping for long periods of time.
In case current limit proves to be problematic in the future, it can be
easily converted to be configurable using a sysctl.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3852ef7

mlxsw: spectrum_router: Implement FIB offload in deferred work · 3057224e

由 Ido Schimmel 提交于 12月 03, 2016

FIB offload is currently done in process context with RTNL held, but
we're about to dump the FIB tables in RCU critical section, so we can no
longer sleep.

Instead, defer the operation to process context using deferred work. Make
sure fib info isn't freed while the work is queued by taking a reference
on it and releasing it after the operation is done.

Deferring the operation is valid because the upper layers always assume
the operation was successful. If it's not, then the driver-specific
abort mechanism is called and all routed traffic is directed to slow
path.

The work items are submitted to an ordered workqueue to prevent a
mismatch between the kernel's FIB table and the device's.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3057224e

mlxsw: core: Create an ordered workqueue for FIB offload · a3832b31

由 Ido Schimmel 提交于 12月 03, 2016

We're going to start processing FIB entries addition / deletion events
in deferred work. These work items must be processed in the order they
were submitted or otherwise we can have differences between the kernel's
FIB table and the device's.

Solve this by creating an ordered workqueue to which these work items
will be submitted to. Note that we can't simply convert the current
workqueue to be ordered, as EMADs re-transmissions are also processed in
deferred work.

Later on, we can migrate other work items to this workqueue, such as FDB
notification processing and nexthop resolution, since they all take the
same lock anyway.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3832b31

mlx4: use reset to set mac header · 69029109

由 Zhang Shengju 提交于 12月 02, 2016

Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69029109

03 12月, 2016 6 次提交

mlx4: fix use-after-free in mlx4_en_fold_software_stats() · 7f7bf160

由 Eric Dumazet 提交于 12月 01, 2016

My recent commit to get more precise rx/tx counters in ndo_get_stats64()
can lead to crashes at device dismantle, as Jesper found out.

We must prevent mlx4_en_fold_software_stats() trying to access
tx/rx rings if they are deleted.

Fix this by adding a test against priv->port_up in
mlx4_en_fold_software_stats()

Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
allows us to eventually broadcast the latest/current counters to
rtnetlink monitors.

Fixes: 40931b85 ("mlx4: give precise rx/tx bytes/packets counters")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-and-bisected-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f7bf160

net/mlx5e: Support adding ingress tc rule when egress device flag is set · ebe06875

由 Hadar Hen Zion 提交于 12月 01, 2016

When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.

In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.

Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebe06875

net/mlx5e: Save the represntor netdevice as part of the representor · 726293f1

由 Hadar Hen Zion 提交于 12月 01, 2016

Replace the representor private data to a net_device pointer holding the
representor netdevice, instead of void pointer holding mlx5e_priv.

It will be used by a new eswitch service function, returning the uplink representor
netdevice.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

726293f1

net/mlx5e: Bring back representor's ndos that were accidentally removed · 718f13e7

由 Hadar Hen Zion 提交于 12月 01, 2016

The VF Representor udp tunnel ndo entries were removed by mistake,
return them.

Fixes: 370bad0f ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode')
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

718f13e7

net/mlx5e: skip loopback selftest with !CONFIG_INET · d709b2a1

由 Arnd Bergmann 提交于 11月 30, 2016

When CONFIG_INET is disabled, the new selftest results in a link
error:

drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback':
en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check'
en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum'

This hides the specific test in that configuration.

Fixes: 0952da79 ("net/mlx5e: Add support for loopback selftest")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d709b2a1

bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller · 366cbf2f

由 Daniel Borkmann 提交于 11月 30, 2016

After 326fe02d ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers
need to hold rcu_read_lock() already to make sure BPF program doesn't
get released in the background.

Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading.
Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping
in XDP supported drivers and to keep the typecheck on the context intact.
For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can
just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock()
out of the helper. When the driver gets atomic replace support, this will
move to call-sites eventually.

mlx5 needs actual fixing as it has the same issue as described already in
326fe02d ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
that is, we're under RCU bh at this time, BPF programs are released via
call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark
read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue
reset.

Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

366cbf2f

02 12月, 2016 7 次提交

net/mlx5e: Remove flow encap entry in the correct place · 5067b602

由 Roi Dayan 提交于 11月 30, 2016

Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.

Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5067b602

net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance · 961e8979

由 Roi Dayan 提交于 11月 30, 2016

Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

961e8979

net/mlx5e: Correct cleanup order when deleting offloaded TC rules · 86a33ae1

由 Roi Dayan 提交于 11月 30, 2016

According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.

Fixes: 8b32580d ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86a33ae1

net/mlx5e: Remove redundant hashtable lookup in configure flower · 53636068

由 Roi Dayan 提交于 11月 30, 2016

We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.

Fixes: e3a2b7ed ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53636068

net/mlx5e: Create UMR MKey per RQ · ec8b9981

由 Tariq Toukan 提交于 11月 30, 2016

In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.

Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8b9981

net/mlx5e: Move function mlx5e_create_umr_mkey · 3608ae77

由 Tariq Toukan 提交于 11月 30, 2016

In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3608ae77

net/mlx5e: Implement Fragmented Work Queue (WQ) · 1c1b5228

由 Tariq Toukan 提交于 11月 30, 2016

Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).

This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1b5228

01 12月, 2016 2 次提交

ethernet :mellanox :mlx5: Replace pci_pool_alloc by pci_pool_zalloc · fec668d3

由 Souptick Joarder 提交于 11月 30, 2016

In alloc_cmd_box(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: NSouptick joarder <jrdr.linux@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec668d3

ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc · 77d1337b

由 Souptick Joarder 提交于 11月 30, 2016

In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: NSouptick joarder <jrdr.linux@gmail.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77d1337b

30 11月, 2016 6 次提交

mlxsw: core: Change order of operations in removal path · 523779c7

由 Ido Schimmel 提交于 11月 28, 2016

We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.

This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

523779c7

mlxsw: core: Add missing rollback in error path · 81d4d728

由 Ido Schimmel 提交于 11月 28, 2016

Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.

Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.

Fixes: a50c1e35 ("mlxsw: core: Implement thermal zone")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81d4d728

mlxsw: spectrum_buffers: Limit size of pools · 87259f18

由 Ido Schimmel 提交于 11月 28, 2016

The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.

While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.

Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.

Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned

With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87259f18

mlxsw: resources: Add maximum buffer size · f414b48e

由 Ido Schimmel 提交于 11月 28, 2016

We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f414b48e

mlxsw: switchib: add MLXSW_PCI dependency · 67ea7ef1

由 Arnd Bergmann 提交于 11月 28, 2016

The newly added switchib driver fails to link if MLXSW_PCI=m:

drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'

The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.

Fixes: d1ba5263 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67ea7ef1

mlx4: give precise rx/tx bytes/packets counters · 40931b85

由 Eric Dumazet 提交于 11月 25, 2016

mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.

Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
Average:         eth0 142827.50 3179259.70   9206.30 4700578.16

This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.

We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
Average:         eth0 142835.66 3182165.93   9206.98 4704874.08
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40931b85

29 11月, 2016 2 次提交

net/mlx4: Fix uninitialized fields in rule when adding promiscuous mode to... · 44b911e7

由 Jack Morgenstein 提交于 11月 27, 2016

net/mlx4: Fix uninitialized fields in rule when adding promiscuous mode to device managed flow steering

In procedure mlx4_flow_steer_promisc_add(), several fields
were left uninitialized in the rule structure.
Correctly initialize these fields.

Fixes: 592e49dd ("net/mlx4: Implement promiscuous mode with device managed flow-steering")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44b911e7

Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow" · b4353708

由 Tariq Toukan 提交于 11月 27, 2016

This reverts commit 9d769311.

Using unregister_netdev at shutdown flow prevents calling
the netdev's ndos or trying to access its freed resources.

This fixes crashes like the following:
 Call Trace:
  [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
  [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
  [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
  [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
  [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
  [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
  [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
  [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
  [<ffffffff815a6205>] linkwatch_event+0x25/0x30
  [<ffffffff81099a82>] process_one_work+0x152/0x400
  [<ffffffff8109a325>] worker_thread+0x125/0x4b0
  [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
  [<ffffffff8109fc6a>] kthread+0xca/0xe0
  [<ffffffff8109fba0>] ? kthread_park+0x60/0x60
  [<ffffffff816a1285>] ret_from_fork+0x25/0x30

Fixes: 9d769311 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4353708

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功