提交 · 21635d9203e1cf2b73b67e9a86059a62f62a3563 · openeuler / Kernel

01 12月, 2021 2 次提交

net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X · 21635d92

由 Marek Behún 提交于 11月 30, 2021

According to SERDES scripts for 88E6393X, erratum 4.8 has to be applied
every time before SerDes is powered on.

Split the code for erratum 4.8 into separate function and call it in
mv88e6393x_serdes_power().

Fixes: de776d0d ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Signed-off-by: NMarek Behún <kabel@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21635d92

natsemi: xtensa: fix section mismatch warnings · b0f38e15

由 Randy Dunlap 提交于 11月 29, 2021

Fix section mismatch warnings in xtsonic. The first one appears to be
bogus and after fixing the second one, the first one is gone.

WARNING: modpost: vmlinux.o(.text+0x529adc): Section mismatch in reference from the function sonic_get_stats() to the function .init.text:set_reset_devices()
The function sonic_get_stats() references
the function __init set_reset_devices().
This is often because sonic_get_stats lacks a __init
annotation or the annotation of set_reset_devices is wrong.

WARNING: modpost: vmlinux.o(.text+0x529b3b): Section mismatch in reference from the function xtsonic_probe() to the function .init.text:sonic_probe1()
The function xtsonic_probe() references
the function __init sonic_probe1().
This is often because xtsonic_probe lacks a __init
annotation or the annotation of sonic_probe1 is wrong.

Fixes: 74f2a5f0 ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Chris Zankel <chris@zankel.net>
Cc: linux-xtensa@linux-xtensa.org
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: NMax Filippov <jcmvbkbc@gmail.com>
Link: https://lore.kernel.org/r/20211130063947.7529-1-rdunlap@infradead.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b0f38e15

30 11月, 2021 10 次提交

dpaa2-eth: destroy workqueue at the end of remove function · f4a8adbf

由 Dongliang Mu 提交于 11月 30, 2021

The commit c5521189 ("dpaa2-eth: support PTP Sync packet one-step
timestamping") forgets to destroy workqueue at the end of remove
function.

Fix this by adding destroy_workqueue before fsl_mc_portal_free and
free_netdev.

Fixes: c5521189 ("dpaa2-eth: support PTP Sync packet one-step timestamping")
Signed-off-by: NDongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4a8adbf

ice: xsk: clear status_error0 for each allocated desc · d1ec975f

由 Maciej Fijalkowski 提交于 11月 29, 2021

Fix a bug in which the receiving of packets can stop in the zero-copy
driver. Ice HW ignores 3 lower bits from QRX_TAIL register, which means
that tail is bumped only on intervals of 8. Currently with XSK RX
batching in place, ice_alloc_rx_bufs_zc() clears the status_error0 only
of the last descriptor that has been allocated/taken from the XSK buffer
pool. status_error0 includes DD bit that is looked upon by the
ice_clean_rx_irq_zc() to tell if a descriptor can be processed.

The bug can be triggered when driver updates the ntu but not the
QRX_TAIL, so HW wouldn't have a chance to write to the ready
descriptors. Later on driver moves the ntc to the mentioned set of
descriptors and interprets them as a ready to be processed, since
corresponding DD bits were not cleared nor any writeback has happened
that would clear it. This can then lead to ntc == ntu case which means
that ring is empty and no further packet processing.

Fix the XSK traffic hang that can be observed when l2fwd scenario from
xdpsock is used by making sure that status_error0 is cleared for each
descriptor that is fed to HW and therefore we are sure that driver will
not processed non-valid DD bits. This will also prevent the driver from
processing the descriptors that were allocated in favor of the
previously processed ones, but writeback didn't happen yet.

Fixes: db804cfc ("ice: Use the xsk batched rx allocation interface")
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1ec975f

net: marvell: mvpp2: Fix the computation of shared CPUs · b83f5ac7

由 Christophe JAILLET 提交于 11月 29, 2021

'bitmap_fill()' fills a bitmap one 'long' at a time.
It is likely that an exact number of bits is expected.

Use 'bitmap_set()' instead in order not to set unexpected bits.

Fixes: e531f767 ("net: mvpp2: handle cases where more CPUs are available than s/w threads")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b83f5ac7

net: mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set() · 1a59c9c5

由 Wei Yongjun 提交于 11月 29, 2021

Add the missing mutex_unlock before return from function
ocelot_hwstamp_set() in the ocelot_setup_ptp_traps() error
handling case.

Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211129151652.1165433-1-weiyongjun1@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

1a59c9c5

wireguard: ratelimiter: use kvcalloc() instead of kvzalloc() · 4e3fd721

由 Gustavo A. R. Silva 提交于 11月 29, 2021

Use 2-factor argument form kvcalloc() instead of kvzalloc().

Link: https://github.com/KSPP/linux/issues/162
Fixes: e7096c13 ("net: WireGuard secure network tunnel")
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
[Jason: Gustavo's link above is for KSPP, but this isn't actually a
 security fix, as table_size is bounded to 8192 anyway, and gcc realizes
 this, so the codegen comes out to be about the same.]
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4e3fd721

wireguard: receive: drop handshakes if queue lock is contended · fb32f4f6

由 Jason A. Donenfeld 提交于 11月 29, 2021

If we're being delivered packets from multiple CPUs so quickly that the
ring lock is contended for CPU tries, then it's safe to assume that the
queue is near capacity anyway, so just drop the packet rather than
spinning. This helps deal with multicore DoS that can interfere with
data path performance. It _still_ does not completely fix the issue, but
it again chips away at it.
Reported-by: NStreun Fabio <fstreun@student.ethz.ch>
Fixes: e7096c13 ("net: WireGuard secure network tunnel")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

fb32f4f6

wireguard: receive: use ring buffer for incoming handshakes · 886fcee9

由 Jason A. Donenfeld 提交于 11月 29, 2021

Apparently the spinlock on incoming_handshake's skb_queue is highly
contended, and a torrent of handshake or cookie packets can bring the
data plane to its knees, simply by virtue of enqueueing the handshake
packets to be processed asynchronously. So, we try switching this to a
ring buffer to hopefully have less lock contention. This alleviates the
problem somewhat, though it still isn't perfect, so future patches will
have to improve this further. However, it at least doesn't completely
diminish the data plane.
Reported-by: NStreun Fabio <fstreun@student.ethz.ch>
Reported-by: NJoel Wanner <joel.wanner@inf.ethz.ch>
Fixes: e7096c13 ("net: WireGuard secure network tunnel")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

886fcee9

wireguard: device: reset peer src endpoint when netns exits · 20ae1d6a

由 Jason A. Donenfeld 提交于 11月 29, 2021

Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.

However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.

This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Tested-by: NHangbin Liu <liuhangbin@gmail.com>
Reported-by: NXiumei Mu <xmu@redhat.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: 900575aa ("wireguard: device: avoid circular netns references")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

20ae1d6a

wireguard: main: rename 'mod_init' & 'mod_exit' functions to be module-specific · b251b711

由 Randy Dunlap 提交于 11月 29, 2021

Rename module_init & module_exit functions that are named
"mod_init" and "mod_exit" so that they are unique in both the
System.map file and in initcall_debug output instead of showing
up as almost anonymous "mod_init".

This is helpful for debugging and in determining how long certain
module_init calls take to execute.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b251b711

wireguard: allowedips: add missing __rcu annotation to satisfy sparse · ae928781

由 Jason A. Donenfeld 提交于 11月 29, 2021

A __rcu annotation got lost during refactoring, which caused sparse to
become enraged.

Fixes: bf7b042d ("wireguard: allowedips: free empty intermediate nodes when removing single node")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ae928781

29 11月, 2021 11 次提交

atlantic: Remove warn trace message. · 060a0fb7

由 Sameer Saurabh 提交于 11月 29, 2021

Remove the warn trace message - it's not a correct check here, because
the function can still be called on the device in DOWN state

Fixes: 508f2e3d ("net: atlantic: split rx and tx per-queue stats")
Signed-off-by: NSameer Saurabh <ssaurabh@marvell.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

060a0fb7

atlantic: Fix statistics logic for production hardware · 2087ced0

由 Dmitry Bogdanov 提交于 11月 29, 2021

B0 is the main and widespread device revision of atlantic2 HW. In the
current state, driver will incorrectly fetch the statistics for this
revision.

Fixes: 5cfd54d7 ("net: atlantic: minimal A2 fw_ops")
Signed-off-by: NDmitry Bogdanov <dbezrukov@marvell.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2087ced0

Remove Half duplex mode speed capabilities. · 03fa5121

由 Sameer Saurabh 提交于 11月 29, 2021

Since Half Duplex mode has been deprecated by the firmware, driver should
not advertise Half Duplex speed in ethtool support link speed values.

Fixes: 071a0204 ("net: atlantic: A2: half duplex support")
Signed-off-by: NSameer Saurabh <ssaurabh@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03fa5121

atlantic: Add missing DIDs and fix 115c. · 413d5e09

由 Nikita Danilov 提交于 11月 29, 2021

At the late production stages new dev ids were introduced. These are
now in production, so its important for the driver to recognize these.
And also fix the board caps for AQC115C adapter.

Fixes: b3f0c79c ("net: atlantic: A2 hw_ops skeleton")
Signed-off-by: NNikita Danilov <ndanilov@aquantia.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

413d5e09

atlantic: Fix to display FW bundle version instead of FW mac version. · 2465c802

由 Sameer Saurabh 提交于 11月 29, 2021

The correct way to reflect firmware version is to use bundle version.
Hence populating the same instead of MAC fw version.

Fixes: c1be0bf0 ("net: atlantic: common functions needed for basic A2 init/deinit hw_ops")
Signed-off-by: NSameer Saurabh <ssaurabh@marvell.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2465c802

atlatnic: enable Nbase-t speeds with base-t · aa685acd

由 Nikita Danilov 提交于 11月 29, 2021

When 2.5G is advertised, N-Base should be advertised against the T-base
caps. N5G is out of use in baseline code and driver should treat both 5G
and N5G (and also 2.5G and N2.5G) equally from user perspective.

Fixes: 5cfd54d7 ("net: atlantic: minimal A2 fw_ops")
Signed-off-by: NNikita Danilov <ndanilov@aquantia.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa685acd

atlantic: Increase delay for fw transactions · aa1dcb56

由 Dmitry Bogdanov 提交于 11月 29, 2021

The max waiting period (of 1 ms) while reading the data from FW shared
buffer is too small for certain types of data (e.g., stats). There's a
chance that FW could be updating buffer at the same time and driver
would be unsuccessful in reading data. Firmware manual recommends to
have 1 sec timeout to fix this issue.

Fixes: 5cfd54d7 ("net: atlantic: minimal A2 fw_ops")
Signed-off-by: NDmitry Bogdanov <dbezrukov@marvell.com>
Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1dcb56

net/mlx4_en: Update reported link modes for 1/10G · 2191b1df

由 Erik Ekman 提交于 11月 28, 2021

When link modes were initially added in commit 2c762679
("net/mlx4_en: Use PTYS register to query ethtool settings") and
later updated for the new ethtool API in commit 3d8f7cc7
("net: mlx4: use new ETHTOOL_G/SSETTINGS API") the only 1/10G non-baseT
link modes configured were 1000baseKX, 10000baseKX4 and 10000baseKR.
It looks like these got picked to represent other modes since nothing
better was available.

Switch to using more specific link modes added in commit 5711a982
("net: ethtool: add support for 1000BaseX and missing 10G link modes").

Tested with MCX311A-XCAT connected via DAC.
Before:

% sudo ethtool enp3s0
Settings for enp3s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseKX/Full
	                        10000baseKR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseKX/Full
	                        10000baseKR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes

With this change:

% sudo ethtool enp3s0
	Settings for enp3s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseX/Full
	                        10000baseCR/Full
 	                        10000baseSR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseX/Full
 	                        10000baseCR/Full
 	                        10000baseSR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes
Tested-by: NMichael Stapelberg <michael@stapelberg.ch>
Signed-off-by: NErik Ekman <erik@kryo.se>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2191b1df

net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available · 817b6531

由 Sven Schuchmann 提交于 11月 27, 2021

On most systems request for IRQ 0 will fail, phylib will print an error message
and fall back to polling. To fix this set the phydev->irq to PHY_POLL if no IRQ
is available.

Fixes: cc89c323 ("lan78xx: Use irq_domain for phy interrupt from USB Int. EP")
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NSven Schuchmann <schuchmann@schleissheimer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

817b6531

net: dsa: realtek-smi: fix indirect reg access for ports>3 · 1e89ad86

由 Luiz Angelo Daros de Luca 提交于 11月 26, 2021

This switch family can have up to 8 UTP ports {0..7}. However,
INDIRECT_ACCESS_ADDRESS_PHYNUM_MASK was using 2 bits instead of 3,
dropping the most significant bit during indirect register reads and
writes. Reading or writing ports 4, 5, 6, and 7 registers was actually
manipulating, respectively, ports 0, 1, 2, and 3 registers.

This is not sufficient but necessary to support any variant with more
than 4 UTP ports, like RTL8367S.

rtl8365mb_phy_{read,write} will now returns -EINVAL if phy is greater
than 7.

Fixes: 4af2950c ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Signed-off-by: NLuiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e89ad86

net: stmmac: Avoid DMA_CHAN_CONTROL write if no Split Header support · f8e7dfd6

由 Vincent Whitchurch 提交于 11月 26, 2021

The driver assumes that split headers can be enabled/disabled without
stopping/starting the device, so it writes DMA_CHAN_CONTROL from
stmmac_set_features().  However, on my system (IP v5.10a without Split
Header support), simply writing DMA_CHAN_CONTROL when DMA is running
(for example, with the commands below) leads to a TX watchdog timeout.

 host$ socat TCP-LISTEN:1024,fork,reuseaddr - &
 device$ ethtool -K eth0 tso off
 device$ ethtool -K eth0 tso on
 device$ dd if=/dev/zero bs=1M count=10 | socat - TCP4:host:1024
 <tx watchdog timeout>

Note that since my IP is configured without Split Header support, the
driver always just reads and writes the same value to the
DMA_CHAN_CONTROL register.

I don't have access to any platforms with Split Header support so I
don't know if these writes to the DMA_CHAN_CONTROL while DMA is running
actually work properly on such systems.  I could not find anything in
the databook that says that DMA_CHAN_CONTROL should not be written when
the DMA is running.

But on systems without Split Header support, there is in any case no
need to call enable_sph() in stmmac_set_features() at all since SPH can
never be toggled, so we can avoid the watchdog timeout there by skipping
this call.

Fixes: 8c6fc097 ("net: stmmac: gmac4+: Add Split Header support")
Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8e7dfd6

27 11月, 2021 11 次提交

net: dsa: microchip: implement multi-bridge support · b3612ccd

由 Oleksij Rempel 提交于 11月 26, 2021

Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:

	ip l a name br0 type bridge
	ip l a name br1 type bridge
	ip l s dev br0 up
	ip l s dev br1 up
	ip l s lan1 master br0
	ip l s dev lan1 up
	ip l s lan2 master br1
	ip l s dev lan2 up

	Ping on lan1 and get response on lan2, which should not happen.

This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.

This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.

Note: STP support was broken before this patch and should be fixed
separately.

Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b3612ccd

net: mscc: ocelot: correctly report the timestamping RX filters in ethtool · c49a35ee

由 Vladimir Oltean 提交于 11月 26, 2021

The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c49a35ee

net: mscc: ocelot: set up traps for PTP packets · 96ca08c0

由 Vladimir Oltean 提交于 11月 26, 2021

IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.

So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.

The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
    they arrive there via a trap and not via flooding

It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/

To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)

The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.

Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c

Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.

On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21

So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Reported-by: NPo Liu <po.liu@nxp.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

96ca08c0

net: mscc: ocelot: create a function that replaces an existing VCAP filter · 95706be1

由 Vladimir Oltean 提交于 11月 26, 2021

VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.

But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.

This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

95706be1

net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP · 8a075464

由 Vladimir Oltean 提交于 11月 26, 2021

The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".

According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Suggested-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8a075464

net: hns3: fix incorrect components info of ethtool --reset command · 82229c4d

由 Jie Wang 提交于 11月 26, 2021

Currently, HNS3 driver doesn't clear the reset flags of components after
successfully executing reset, it causes userspace info of
"Components reset" and "Components not reset" is incorrect.

So fix this problem by clear corresponding reset flag after reset process.

Fixes: ddccc5e3 ("net: hns3: add support for triggering reset by ethtool")
Signed-off-by: NJie Wang <wangjie125@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

82229c4d

net: hns3: fix one incorrect value of page pool info when queried by debugfs · 9c147917

由 Hao Chen 提交于 11月 26, 2021

Currently, when user queries page pool info by debugfs command
"cat page_pool_info", the cnt of allocated page for page pool may be
incorrect because of memory inconsistency problem caused by compiler
optimization.

So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
fix this problem.

Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: NHao Chen <chenhao288@hisilicon.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9c147917

net: hns3: add check NULL address for page pool · b8af344c

由 Hao Chen 提交于 11月 26, 2021

When page pool is not enabled, its address value is still NULL and page
pool should not be accessed, so add a check for it.

Fixes: 850bfb91 ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: NHao Chen <chenhao288@hisilicon.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b8af344c

net: hns3: fix VF RSS failed problem after PF enable multi-TCs · 8d2ad993

由 Guangbin Huang 提交于 11月 26, 2021

When PF is set to multi-TCs and configured mapping relationship between
priorities and TCs, the hardware will active these settings for this PF
and its VFs.

In this case when VF just uses one TC and its rx packets contain priority,
and if the priority is not mapped to TC0, as other TCs of VF is not valid,
hardware always put this kind of packets to the queue 0. It cause this kind
of packets of VF can not be used RSS function.

To fix this problem, set tc mode of all unused TCs of VF to the setting of
TC0, then rx packet with priority which map to unused TC will be direct to
TC0.

Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8d2ad993

net: qed: fix the array may be out of bound · 0435a4d0

由 zhangyue 提交于 11月 25, 2021

If the variable 'p_bit->flags' is always 0,
the loop condition is always 0.

The variable 'j' may be greater than or equal to 32.

At this time, the array 'p_aeu->bits[32]' may be out
of bound.
Signed-off-by: Nzhangyue <zhangyue1@kylinos.cn>
Link: https://lore.kernel.org/r/20211125113610.273841-1-zhangyue1@kylinos.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0435a4d0

net: stmmac: Disable Tx queues when reconfiguring the interface · b270bfe6

由 Yannick Vignon 提交于 11月 24, 2021

The Tx queues were not disabled in situations where the driver needed to
stop the interface to apply a new configuration. This could result in a
kernel panic when doing any of the 3 following actions:
* reconfiguring the number of queues (ethtool -L)
* reconfiguring the size of the ring buffers (ethtool -G)
* installing/removing an XDP program (ip l set dev ethX xdp)

Prevent the panic by making sure netif_tx_disable is called when stopping
an interface.

Without this patch, the following kernel panic can be observed when doing
any of the actions above:

Unable to handle kernel paging request at virtual address ffff80001238d040
[....]
 Call trace:
  dwmac4_set_addr+0x8/0x10
  dev_hard_start_xmit+0xe4/0x1ac
  sch_direct_xmit+0xe8/0x39c
  __dev_queue_xmit+0x3ec/0xaf0
  dev_queue_xmit+0x14/0x20
[...]
[ end trace 0000000000000002 ]---

Fixes: 5fabb012 ("net: stmmac: Add initial XDP support")
Fixes: aa042f60 ("net: stmmac: Add support to Ethtool get/set ring parameters")
Fixes: 0366f7e0 ("net: stmmac: add ethtool support for get/set channels")
Signed-off-by: NYannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20211124154731.1676949-1-yannick.vignon@oss.nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b270bfe6

25 11月, 2021 5 次提交

mdio: aspeed: Fix "Link is Down" issue · 9dbe33cf

由 Dylan Hung 提交于 11月 25, 2021

The issue happened randomly in runtime.  The message "Link is Down" is
popped but soon it recovered to "Link is Up".

The "Link is Down" results from the incorrect read data for reading the
PHY register via MDIO bus.  The correct sequence for reading the data
shall be:
1. fire the command
2. wait for command done (this step was missing)
3. wait for data idle
4. read data from data register

Cc: stable@vger.kernel.org
Fixes: f160e994 ("net: phy: Add mdio-aspeed")
Reviewed-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20211125024432.15809-1-dylan_hung@aspeedtech.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

9dbe33cf

igb: fix netpoll exit with traffic · eaeace60

由 Jesse Brandeburg 提交于 11月 23, 2021

Oleksandr brought a bug report where netpoll causes trace
messages in the log on igb.

Danielle brought this back up as still occurring, so we'll try
again.

[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0

As Alex suggested, change the driver to return work_done at the
exit of napi_poll, which should be safe to do in this driver
because it is not polling multiple queues in this single napi
context (multiple queues attached to one MSI-X vector). Several
other drivers contain the same simple sequence, so I hope
this will not create new problems.

Fixes: 16eb8815 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: NDanielle Ratson <danieller@nvidia.com>
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NDanielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/20211123204000.1597971-1-jesse.brandeburg@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

eaeace60

net: phylink: Force retrigger in case of latched link-fail indicator · dbae3388

由 Russell King (Oracle) 提交于 11月 23, 2021

On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
description:
  This register bit indicates when link was lost since the last
  read. For the current link status, read this register
  back-to-back.

Thus to get current link state, we need to read the register twice.

But doing that in the link change interrupt handler would lead to
potentially ignoring link down events, which we really want to avoid.

Thus this needs to be solved in phylink's resolve, by retriggering
another resolve in the event when PCS reports link down and previous
link was up, and by re-reading PCS state if the previous link was down.

The wrong value is read when phylink requests change from sgmii to
2500base-x mode, and link won't come up. This fixes the bug.

Fixes: 9525ae83 ("phylink: add phylink infrastructure")
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NMarek Behún <kabel@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

dbae3388

net: phylink: Force link down and retrigger resolve on interface change · 80662f4f

由 Russell King (Oracle) 提交于 11月 23, 2021

On PHY state change the phylink_resolve() function can read stale
information from the MAC and report incorrect link speed and duplex to
the kernel message log.

Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
88E6393X switch:
- PHY driver triggers state change due to PHY interface mode being
  changed from 10gbase-r to 2500base-x due to copper change in speed
  from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
  its interface to the host, or the interrupt about loss of SerDes link
  hadn't arrived yet (there can be a delay of several milliseconds for
  this), so we still think that the 10gbase-r mode is up
- phylink_resolve()
  - phylink_mac_pcs_get_state()
    - this fills in speed=10g link=up
  - interface mode is updated to 2500base-x but speed is left at 10Gbps
  - phylink_major_config()
    - interface is changed to 2500base-x
  - phylink_link_up()
    - mv88e6xxx_mac_link_up()
      - .port_set_speed_duplex()
        - speed is set to 10Gbps
    - reports "Link is Up - 10Gbps/Full" to dmesg

Afterwards when the interrupt finally arrives for mv88e6xxx, another
resolve is forced in which we get the correct speed from
phylink_mac_pcs_get_state(), but since the interface is not being
changed anymore, we don't call phylink_major_config() but only
phylink_mac_config(), which does not set speed/duplex anymore.

To fix this, we need to force the link down and trigger another resolve
on PHY interface change event.

Fixes: 9525ae83 ("phylink: add phylink infrastructure")
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NMarek Behún <kabel@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

80662f4f

lan743x: fix deadlock in lan743x_phy_link_status_change() · ddb826c2

由 Heiner Kallweit 提交于 11月 24, 2021

Usage of phy_ethtool_get_link_ksettings() in the link status change
handler isn't needed, and in combination with the referenced change
it results in a deadlock. Simply remove the call and replace it with
direct access to phydev->speed. The duplex argument of
lan743x_phy_update_flowcontrol() isn't used and can be removed.

Fixes: c10a485c ("phy: phy_ethtool_ksettings_get: Lock the phy for consistency")
Reported-by: NAlessandro B Maurici <abmaurici@gmail.com>
Tested-by: NAlessandro B Maurici <abmaurici@gmail.com>
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/40e27f76-0ba3-dcef-ee32-a78b9df38b0f@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ddb826c2

23 11月, 2021 1 次提交

net: marvell: mvpp2: increase MTU limit when XDP enabled · 7b1b62bc

由 Marek Behún 提交于 11月 22, 2021

Currently mvpp2_xdp_setup won't allow attaching XDP program if
  mtu > ETH_DATA_LEN (1500).

The mvpp2_change_mtu on the other hand checks whether
  MVPP2_RX_PKT_SIZE(mtu) > MVPP2_BM_LONG_PKT_SIZE.

These two checks are semantically different.

Moreover this limit can be increased to MVPP2_MAX_RX_BUF_SIZE, since in
mvpp2_rx we have
  xdp.data = data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM;
  xdp.frame_sz = PAGE_SIZE;

Change the checks to check whether
  mtu > MVPP2_MAX_RX_BUF_SIZE

Fixes: 07dd0a7a ("mvpp2: add basic XDP support")
Signed-off-by: NMarek Behún <kabel@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b1b62bc

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功