提交 · ed66bfb4ce34a94174bb755eeaca85d1661d36ad · openeuler / Kernel

17 4月, 2021 40 次提交

mptcp: add tracepoint in ack_update_msk · ed66bfb4

由 Geliang Tang 提交于 4月 16, 2021

This patch added a tracepoint in ack_update_msk() to track the
incoming data_ack and window/snd_una updates.
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed66bfb4

mptcp: add tracepoint in get_mapping_status · 0918e34b

由 Geliang Tang 提交于 4月 16, 2021

This patch added a tracepoint in the mapping status function
get_mapping_status() to dump every mpext field.
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0918e34b

mptcp: add tracepoint in mptcp_subflow_get_send · e10a9892

由 Geliang Tang 提交于 4月 16, 2021

This patch added a tracepoint in the packet scheduler function
mptcp_subflow_get_send().
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e10a9892

mptcp: export mptcp_subflow_active · 43f1140b

由 Geliang Tang 提交于 4月 16, 2021

This patch moved the static function mptcp_subflow_active to protocol.h
as an inline one.
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43f1140b

mptcp: fix format specifiers for unsigned int · e4b61351

由 Geliang Tang 提交于 4月 16, 2021

Some of the sequence numbers are printed as the negative ones in the debug
log:

[   46.250932] MPTCP: DSS
[   46.250940] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
[   46.250948] MPTCP: data_ack=2344892449471675613
[   46.251012] MPTCP: msk=000000006e157e3f status=10
[   46.251023] MPTCP: msk=000000006e157e3f snd_data_fin_enable=0 pending=0 snd_nxt=2344892449471700189 write_seq=2344892449471700189
[   46.251343] MPTCP: msk=00000000ec44a129 ssk=00000000f7abd481 sending dfrag at seq=-1658937016627538668 len=100 already sent=0
[   46.251360] MPTCP: data_seq=16787807057082012948 subflow_seq=1 data_len=100 dsn64=1

This patch used the format specifier %u instead of %d for the unsigned int
values to fix it.

Fixes: d9ca1de8 ("mptcp: move page frag allocation in mptcp_sendmsg()")
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4b61351

kunit: mptcp: adhere to KUNIT formatting standard · 3fcc8a25

由 Nico Pache 提交于 4月 16, 2021

Drop 'S' from end of CONFIG_MPTCP_KUNIT_TESTS in order to adhere to the
KUNIT *_KUNIT_TEST config name format.

Fixes: a00a5822 (mptcp: move crypto test to KUNIT)
Reviewed-by: NDavid Gow <davidgow@google.com>
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NNico Pache <npache@redhat.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fcc8a25

Merge branch 'enetc-xdp-fixes' · 820dd7a2

由 David S. Miller 提交于 4月 16, 2021

Vladimir Oltean says:

====================
Fixups for XDP on NXP ENETC

After some more XDP testing on the NXP LS1028A, this is a set of 10 bug
fixes, simplifications and tweaks, ranging from addressing Toke's feedback
(the network stack can run concurrently with XDP on the same TX rings)
to fixing some OOM conditions seen under TX congestion.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

820dd7a2

net: enetc: apply the MDIO workaround for XDP_REDIRECT too · 24e39309

由 Vladimir Oltean 提交于 4月 17, 2021

Described in fd5736bf ("enetc: Workaround for MDIO register access
issue") is a workaround for a hardware bug that requires a register
access of the MDIO controller to never happen concurrently with a
register access of a port PF. To avoid that, a mutual exclusion scheme
with rwlocks was implemented - the port PF accessors are the 'read'
side, and the MDIO accessors are the 'write' side.

When we do XDP_REDIRECT between two ENETC interfaces, all is fine
because the MDIO lock is already taken from the NAPI poll loop.

But when the ingress interface is not ENETC, just the egress is, the
MDIO lock is not taken, so we might access the port PF registers
concurrently with MDIO, which will make the link flap due to wrong
values returned from the PHY.

To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at
the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock
is designed as a rwlock is important here, because the read side is
reentrant (that is one of the main reasons why we chose it). Usually,
the way we benefit of its reentrancy is by running the data path
concurrently on both CPUs, but in this case, we benefit from the
reentrancy by taking the lock even when the lock is already taken
(and that's the situation where ENETC is both the ingress and the egress
interface for XDP_REDIRECT, which was fine before and still is fine now).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24e39309

net: enetc: fix buffer leaks with XDP_TX enqueue rejections · 92ff9a6e

由 Vladimir Oltean 提交于 4月 17, 2021

If the TX ring is congested, enetc_xdp_tx() returns false for the
current XDP frame (represented as an array of software BDs).

This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd
from software BDs freshly cleaned from the RX ring. The issue is that we
scrub the RX software BDs too soon, more precisely before we know that
we can enqueue the TX BDs successfully into the TX ring.

If we can't enqueue them (and enetc_xdp_tx returns false), we call
enetc_xdp_drop which attempts to recycle the buffers held by the RX
software BDs. But because we scrubbed those RX BDs already, two things
happen:

(a) we leak their memory
(b) we populate the RX software BD ring with an all-zero rx_swbd
    structure, which makes the buffer refill path allocate more memory.

enetc_refill_rx_ring
-> if (unlikely(!rx_swbd->page))
   -> enetc_new_page

That is a recipe for fast OOM.

Fixes: 7ed2bc80 ("net: enetc: add support for XDP_TX")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ff9a6e

net: enetc: handle the invalid XDP action the same way as XDP_DROP · 975acc83

由 Vladimir Oltean 提交于 4月 17, 2021

When the XDP program returns an invalid action, we should free the RX
buffer.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

975acc83

net: enetc: use dedicated TX rings for XDP · 7eab503b

由 Vladimir Oltean 提交于 4月 17, 2021

It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.

At the same time, it is possible for the other CPU to already use TX
ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
exclusion between XDP and the network stack, we run into an issue
because the ENETC TX procedure is not reentrant.

The obvious approach would be to just make XDP take the lock of the
network stack's TX queue corresponding to the ring it's about to enqueue
in.

For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
and end of enetc_xdp_xmit() should do the trick.

But for XDP_TX, it's a bit more complicated. For one, we do TX batching
all by ourselves for frames with the XDP_TX verdict. This is something
we would like to keep the way it is, for performance reasons. But
batching means that the network stack's lock should be kept from the
first enqueued XDP_TX frame and until we ring the doorbell. That is
mostly fine, except for cases when in the same NAPI loop we have mixed
XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
we are holding the lock from the RX NAPI, then bam, deadlock. The naive
answer could be 'just flush the XDP_TX frames first, then release the
network stack's TX queue lock, then call xdp_do_flush_map()'. But even
xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
there simply isn't any clean way to protect XDP_TX from concurrent
network stack .ndo_start_xmit() on another CPU.

So we need to take a different approach, and that is to reserve two
rings for the sole use of XDP. We leave TX rings
0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
pick them from the end of the priv->tx_ring array.

We make an effort to keep the mapping done by enetc_alloc_msix() which
decides which CPU handles the TX completions of which TX ring in its
NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
XDP TX ring of CPU 1 is handled by TX ring 7.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7eab503b

net: enetc: increase TX ring size · ee3e875f

由 Vladimir Oltean 提交于 4月 17, 2021

Now that commit d6a2829e ("net: enetc: increase RX ring default
size") has increased the RX ring size, it is quite easy to congest the
TX rings when the traffic is predominantly XDP_TX, as the RX ring is
quite a bit larger than the TX one.

Since we bit the bullet and did the expensive thing already (larger RX
rings consume more memory pages), it seems quite foolish to keep the TX
rings small. So make them equally sized with TX.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee3e875f

net: enetc: remove unneeded xdp_do_flush_map() · a6369fe6

由 Vladimir Oltean 提交于 4月 17, 2021

xdp_do_redirect already contains:
-> dev_map_enqueue
   -> __xdp_enqueue
      -> bq_enqueue
         -> bq_xmit_all // if we have more than 16 frames

So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK
is 128.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6369fe6

net: enetc: stop XDP NAPI processing when build_skb() fails · 8f50d8bb

由 Vladimir Oltean 提交于 4月 17, 2021

When the code path below fails:

enetc_clean_rx_ring_xdp // XDP_PASS
-> enetc_build_skb
   -> enetc_map_rx_buff_to_skb
      -> build_skb

enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't
strong enough to actually break the NAPI poll loop, just the switch/case
statement for XDP actions. So we increment rx_frm_cnt and go to the next
frames minding our own business.

Instead let's do what the skb NAPI poll function does, and break the
loop now, waiting for the memory pressure to go away. Otherwise the next
calls to build_skb() are likely to fail too.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f50d8bb

net: enetc: recycle buffers for frames with RX errors · 672f9a21

由 Vladimir Oltean 提交于 4月 17, 2021

When receiving a frame with errors, currently we do nothing with it (we
don't construct an skb or an xdp_buff), we just exit the NAPI poll loop.

Let's put the buffer back into the RX ring (similar to XDP_DROP).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

672f9a21

net: enetc: rename the buffer reuse helpers · 6b04830d

由 Vladimir Oltean 提交于 4月 17, 2021

enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a
helper to populate the recycle end of the shadow RX BD ring
(next_to_alloc) with a given buffer.

On the other hand, enetc_put_rx_buff plays more tricks than its name
would suggest.

So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the
half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff
into enetc_put_rx_buff which suggests a more garden-variety operation.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b04830d

net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path · e9e49ae8

由 Vladimir Oltean 提交于 4月 17, 2021

Later in enetc_clean_tx_ring we have:

		/* Scrub the swbd here so we don't have to do that
		 * when we reuse it during xmit
		 */
		memset(tx_swbd, 0, sizeof(*tx_swbd));

So these assignments are unnecessary.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9e49ae8

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · bc45f524

由 David S. Miller 提交于 4月 16, 2021

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-04-16

This series contains updates to igb and igc drivers.

Ederson adjusts Tx buffer distributions in Qav mode to improve
TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC
functions for igc.

Grzegorz checks that the MTA register was properly written and
retries if not for igb.

Sasha adds reporting of EEE low power idle counters to ethtool and fixes
a return value being overwritten through looping for igc.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc45f524

flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target() · 1e3d976d

由 Gustavo A. R. Silva 提交于 4月 16, 2021

Fix the following out-of-bounds warning:

net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). So, the compiler legitimately complains about it. As these
are just a couple of members, fix this by copying each one of them in
separate calls to memcpy().

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e3d976d

Merge branch 'ethtool-stats' · 1c86514d

由 David S. Miller 提交于 4月 16, 2021

Jakub Kicinski says:

====================
ethtool: add uAPI for reading standard stats

Continuing the effort of providing a unified access method
to standard stats, and explicitly tying the definitions to
the standards this series adds an API for general stats
which do no fit into more targeted control APIs.

There is nothing clever here, just a netlink API for dumping
statistics defined by standards and RFCs which today end up
in ethtool -S under infinite variations of names.

This series adds basic IEEE stats (for PHY, MAC, Ctrl frames)
and RMON stats. AFAICT other RFCs only duplicate the IEEE
stats.

This series does _not_ add a netlink API to read driver-defined
stats. There seems to be little to gain from moving that part
to netlink.

The netlink message format is very simple, and aims to allow
adding stats and groups with no changes to user tooling (which
IIUC is expected for ethtool).

On user space side we can re-use -S, and make it dump
standard stats if --groups are defined.

$ ethtool -S eth0 --groups eth-phy eth-mac eth-ctrl rmon
Stats for eth0:
eth-phy-SymbolErrorDuringCarrier: 0
eth-mac-FramesTransmittedOK: 0
eth-mac-FrameTooLongErrors: 0
eth-ctrl-MACControlFramesTransmitted: 0
eth-ctrl-MACControlFramesReceived: 1
eth-ctrl-UnsupportedOpcodesReceived: 0
rmon-etherStatsUndersizePkts: 0
rmon-etherStatsJabbers: 0
rmon-rx-etherStatsPkts64Octets: 1
rmon-rx-etherStatsPkts128to255Octets: 0
rmon-rx-etherStatsPkts1024toMaxOctets: 1
rmon-tx-etherStatsPkts64Octets: 1
rmon-tx-etherStatsPkts128to255Octets: 0
rmon-tx-etherStatsPkts1024toMaxOctets: 1

v1:

Driver support for mlxsw, mlx5 and bnxt included.

Compared to the RFC I went ahead with wrapping the stats into
a 1:1 nest. Now IDs of stats can start from 0, at a cost of
slightly "careful" u64 alignment handling.

v2:

Add missing kdoc in patch 5.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c86514d

mlx5: implement ethtool standard stats · b572ec9f

由 Jakub Kicinski 提交于 4月 16, 2021

Add support for PHY/MAC/Ctrl/RMON stats.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b572ec9f

bnxt: implement ethtool standard stats · 782bc00a

由 Jakub Kicinski 提交于 4月 16, 2021

Most of the names seem to strongly correlate with names from
the standard and RFC. Whether ..+good_frames are indeed Frames..OK
I'm the least sure of.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

782bc00a

mlxsw: implement ethtool standard stats · c1912ab0

由 Jakub Kicinski 提交于 4月 16, 2021

mlxsw has nicely grouped stats, add support for standard uAPI.
I'm guessing the register access part. Compile tested only.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1912ab0

ethtool: add interface to read RMON stats · a8b06e9d

由 Jakub Kicinski 提交于 4月 16, 2021

Most devices maintain RMON (RFC 2819) stats - particularly
the "histogram" of packets received by size. Unlike other
RFCs which duplicate IEEE stats, the short/oversized frame
counters in RMON don't seem to match IEEE stats 1-to-1 either,
so expose those, too. Do not expose basic packet, CRC errors
etc - those are already otherwise covered.

Because standard defines packet ranges only up to 1518, and
everything above that should theoretically be "oversized"
- devices often create their own ranges.

Going beyond what the RFC defines - expose the "histogram"
in the Tx direction (assume for now that the ranges will
be the same).
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b06e9d

ethtool: add interface to read standard MAC Ctrl stats · bfad2b97

由 Jakub Kicinski 提交于 4月 16, 2021

Number of devices maintains the standard-based MAC control
counters for control frames. Add a API for those.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfad2b97

ethtool: add interface to read standard MAC stats · ca224454

由 Jakub Kicinski 提交于 4月 16, 2021

Most of the MAC statistics are included in
struct rtnl_link_stats64, but some fields
are aggregated. Besides it's good to expose
these clearly hardware stats separately.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca224454

ethtool: add a new command for reading standard stats · f09ea6fb

由 Jakub Kicinski 提交于 4月 16, 2021

Add an interface for reading standard stats, including
stats which don't have a corresponding control interface.

Start with IEEE 802.3 PHY stats. There seems to be only
one stat to expose there.

Define API to not require user space changes when new
stats or groups are added. Groups are based on bitset,
stats have a string set associated.

v1: wrap stats in a nest
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f09ea6fb

docs: ethtool: document standard statistics · ddc78b36

由 Jakub Kicinski 提交于 4月 16, 2021

Add documentation for ETHTOOL_MSG_STATS_GET.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddc78b36

docs: networking: extend the statistics documentation · f117c48c

由 Jakub Kicinski 提交于 4月 16, 2021

Make the lack of expectations for switching NICs explicit,
describe the new stats.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f117c48c

sctp: Fix out-of-bounds warning in sctp_process_asconf_param() · e5272ad4

由 Gustavo A. R. Silva 提交于 4月 16, 2021

Fix the following out-of-bounds warning:

net/sctp/sm_make_chunk.c:3150:4: warning: 'memcpy' offset [17, 28] from the object at 'addr' is out of the bounds of referenced subobject 'v4' with type 'struct sockaddr_in' at offset 0 [-Warray-bounds]

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5272ad4

Merge tag 'mlx5-updates-2021-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 03e481e8

由 David S. Miller 提交于 4月 16, 2021

Saeed Mahameed says:

====================
mlx5-updates-2021-04-16

This patchset introduces updates to mlx5e netdev driver.

1) Tariq refactors TLS offloads and adds resiliency against RX resync
   failures

2) Maxim reduces code duplications by unifying channels reset flow
   regardless if channels are closed or open

3) Aya Enhances TX/RX health reporters diagnostics to expose the
   internal clock time-stamping format

4) Moshe adds support for ethtool extended link state, to show the reason
   for link down
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03e481e8

Merge branch 'gianfar-mq-polling' · 70c18375

由 David S. Miller 提交于 4月 16, 2021

Claudiu Manoil says:

====================
net: gianfar: Drop GFAR_MQ_POLLING support

Drop long time obsolete "per NAPI multi-queue" support in gianfar,
and related (and undocumented) device tree properties.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70c18375

powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map properties · 221e8c12

由 Claudiu Manoil 提交于 4月 16, 2021

These are very old properties that were used by the "gianfar" ethernet
driver.  They don't have documented bindings and are obsolete.
Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

221e8c12

gianfar: Drop GFAR_MQ_POLLING support · 8eda54c5

由 Claudiu Manoil 提交于 4月 16, 2021

Gianfar used to enable all 8 Rx queues (DMA rings) per
ethernet device, even though the controller can only
support 2 interrupt lines at most.  This meant that
multiple Rx queues would have to be grouped per NAPI poll
routine, and the CPU would have to split the budget and
service them in a round robin manner.  The overhead of
this scheme proved to outweight the potential benefits.
The alternative was to introduce the "Single Queue" polling
mode, supporting one Rx queue per NAPI, which became the
default packet processing option and helped improve the
performance of the driver.
MQ_POLLING also relies on undocumeted device tree properties
to specify how to map the 8 Rx and Tx queues to a given
interrupt line (aka "interrupt group").  Using module parameters
to enable this mode wasn't an option either.  Long story short,
MQ_POLLING became obsolete, now it is just dead code, and no
one asked for it so far.
For the Tx queues, multi-queue support (more than 1 Tx queue
per CPU) could be revisited by adding tc MQPRIO support, but
again, one has to consider that there are only 2 interrupt lines.
So the NAPI poll routine would have to service multiple Tx rings.
Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8eda54c5

veth: check for NAPI instead of xdp_prog before xmit of XDP frame · 0e672f30

由 Toke Høiland-Jørgensen 提交于 4月 16, 2021

The recent patch that tied enabling of veth NAPI to the GRO flag also has
the nice side effect that a veth device can be the target of an
XDP_REDIRECT without an XDP program needing to be loaded on the peer
device. However, the patch adding this extra NAPI mode didn't actually
change the check in veth_xdp_xmit() to also look at the new NAPI pointer,
so let's fix that.

Fixes: 6788fa154546 ("veth: allow enabling NAPI even without XDP")
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e672f30

mld: fix suspicious RCU usage in __ipv6_dev_mc_dec() · aa8caa76

由 Taehee Yoo 提交于 4月 16, 2021

__ipv6_dev_mc_dec() internally uses sleepable functions so that caller
must not acquire atomic locks. But caller, which is addrconf_verify_rtnl()
acquires rcu_read_lock_bh().
So this warning occurs in the __ipv6_dev_mc_dec().

Test commands:
    ip netns add A
    ip link add veth0 type veth peer name veth1
    ip link set veth1 netns A
    ip link set veth0 up
    ip netns exec A ip link set veth1 up
    ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1

Splat looks like:
============================
WARNING: suspicious RCU usage
5.12.0-rc6+ #515 Not tainted
-----------------------------
kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
4 locks held by kworker/4:0/1997:
 #0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at:
process_one_work+0x761/0x1440
 #1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at:
process_one_work+0x795/0x1440
 #2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at:
addrconf_verify_work+0xa/0x20
 #3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at:
addrconf_verify_rtnl+0x23/0xc60

stack backtrace:
CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515
Workqueue: ipv6_addrconf addrconf_verify_work
Call Trace:
 dump_stack+0xa4/0xe5
 ___might_sleep+0x27d/0x2b0
 __mutex_lock+0xc8/0x13f0
 ? lock_downgrade+0x690/0x690
 ? __ipv6_dev_mc_dec+0x49/0x2a0
 ? mark_held_locks+0xb7/0x120
 ? mutex_lock_io_nested+0x1270/0x1270
 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
 ? _raw_spin_unlock_irqrestore+0x47/0x50
 ? trace_hardirqs_on+0x41/0x120
 ? __wake_up_common_lock+0xc9/0x100
 ? __wake_up_common+0x620/0x620
 ? memset+0x1f/0x40
 ? netlink_broadcast_filtered+0x2c4/0xa70
 ? __ipv6_dev_mc_dec+0x49/0x2a0
 __ipv6_dev_mc_dec+0x49/0x2a0
 ? netlink_broadcast_filtered+0x2f6/0xa70
 addrconf_leave_solict.part.64+0xad/0xf0
 ? addrconf_join_solict.part.63+0xf0/0xf0
 ? nlmsg_notify+0x63/0x1b0
 __ipv6_ifa_notify+0x22c/0x9c0
 ? inet6_fill_ifaddr+0xbe0/0xbe0
 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
 ? __local_bh_enable_ip+0xa5/0xf0
 ? ipv6_del_addr+0x347/0x870
 ipv6_del_addr+0x3b1/0x870
 ? addrconf_ifdown+0xfe0/0xfe0
 ? rcu_read_lock_any_held.part.27+0x20/0x20
 addrconf_verify_rtnl+0x8a9/0xc60
 addrconf_verify_work+0xf/0x20
 process_one_work+0x84c/0x1440

In order to avoid this problem, it uses rcu_read_unlock_bh() for
a short time. RCU is used for avoiding freeing
ifp(struct *inet6_ifaddr) while ifp is being used. But this will
not be released even if rcu_read_unlock_bh() is used.
Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp).
So this is safe.

Fixes: 63ed8de4 ("mld: add mc_lock for protecting per-interface mld data")
Suggested-by: NEric Dumazet <edumazet@google.com>
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa8caa76

Merge branch 'ipa-fw-names' · d8214c7a

由 David S. Miller 提交于 4月 16, 2021

Alex Elder says:

====================
net: ipa: allow different firmware names

Add the ability to define a "firmware-name" property in the IPA DT
node, specifying an alternate name to use for the firmware file.
Used only if the AP (Trust Zone) does early IPA initialization.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8214c7a

net: ipa: optionally define firmware name via DT · 9ce062ba

由 Alex Elder 提交于 4月 16, 2021

IPA initialization includes loading some firmware.  This step is
done either by the modem or by the AP under Trust Zone.  If the
AP loads firmware, the name of the firmware file is currently
hard-coded ("ipa_fws.mdt").

Add the ability to specify the relative path of the firmware file to
use in a property in the Device Tree IPA node.  If the property is
not found (or if any other error occurs attempting to get it), fall
back to using a default relative path.

Use the "old" fixed name as the default.  Rename the symbol that
represents this default to emphasize its purpose.
Signed-off-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce062ba

dt-bindings: net: qcom,ipa: add firmware-name property · d8604b20

由 Alex Elder 提交于 4月 16, 2021

Add a new optional firmware-name property to the IPA DT node.  It
is used only if the modem is not doing early initialization (i.e.,
if the modem-init property is not present).  Its value is the name
of the firmware file to use; if it's not specified, a default name
("ipa_fws.mdt") is used.
Signed-off-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8604b20

virtio-net: page_to_skb() use build_skb when there's sufficient tailroom · fb32856b

由 Xuan Zhuo 提交于 4月 16, 2021

In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.

Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.

Testing Machine:
    The four queues of the network card are bound to the cpu1.

Test command:
    for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done

The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:

no build_skb:  956864.00 rxpck/s
build_skb:    1158465.00 rxpck/s
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Suggested-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb32856b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功