提交 · 357f203bb3b529fa7471494c7ad6a7a54d070353 · openeuler / Kernel

30 1月, 2021 12 次提交

net: dsa: keep a copy of the tagging protocol in the DSA switch tree · 357f203b

由 Vladimir Oltean 提交于 1月 29, 2021

Cascading DSA switches can be done multiple ways. There is the brute
force approach / tag stacking, where one upstream switch, located
between leaf switches and the host Ethernet controller, will just
happily transport the DSA header of those leaf switches as payload.
For this kind of setups, DSA works without any special kind of treatment
compared to a single switch - they just aren't aware of each other.
Then there's the approach where the upstream switch understands the tags
it transports from its leaves below, as it doesn't push a tag of its own,
but it routes based on the source port & switch id information present
in that tag (as opposed to DMAC & VID) and it strips the tag when
egressing a front-facing port. Currently only Marvell implements the
latter, and Marvell DSA trees contain only Marvell switches.

So it is safe to say that DSA trees already have a single tag protocol
shared by all switches, and in fact this is what makes the switches able
to understand each other. This fact is also implied by the fact that
currently, the tagging protocol is reported as part of a sysfs installed
on the DSA master and not per port, so it must be the same for all the
ports connected to that DSA master regardless of the switch that they
belong to.

It's time to make this official and enforce it (yes, this also means we
won't have any "switch understands tag to some extent but is not able to
speak it" hardware oddities that we'll support in the future).

This is needed due to the imminent introduction of the dsa_switch_ops::
change_tag_protocol driver API. When that is introduced, we'll have
to notify switches of the tagging protocol that they're configured to
use. Currently the tag_ops structure pointer is held only for CPU ports.
But there are switches which don't have CPU ports and nonetheless still
need to be configured. These would be Marvell leaf switches whose
upstream port is just a DSA link. How do we inform these of their
tagging protocol setup/deletion?

One answer to the above would be: iterate through the DSA switch tree's
ports once, list the CPU ports, get their tag_ops, then iterate again
now that we have it, and notify everybody of that tag_ops. But what to
do if conflicts appear between one cpu_dp->tag_ops and another? There's
no escaping the fact that conflict resolution needs to be done, so we
can be upfront about it.

Ease our work and just keep the master copy of the tag_ops inside the
struct dsa_switch_tree. Reference counting is now moved to be per-tree
too, instead of per-CPU port.

There are many places in the data path that access master->dsa_ptr->tag_ops
and we would introduce unnecessary performance penalty going through yet
another indirection, so keep those right where they are.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

357f203b

net: dsa: document the existing switch tree notifiers and add a new one · 886f8e26

由 Vladimir Oltean 提交于 1月 29, 2021

The existence of dsa_broadcast has generated some confusion in the past:
https://www.mail-archive.com/netdev@vger.kernel.org/msg365042.html

So let's document the existing dsa_port_notify and dsa_broadcast
functions and explain when each of them should be used.

Also, in fact, the in-between function has always been there but was
lacking a name, and is the main reason for this patch: dsa_tree_notify.
Refactor dsa_broadcast to use it.

This patch also moves dsa_broadcast (a top-level function) to dsa2.c,
where it really belonged in the first place, but had no companion so it
stood with dsa_port_notify.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

886f8e26

net: dsa: tag_8021q: add helpers to deduce whether a VLAN ID is RX or TX VLAN · 9c7caf28

由 Vladimir Oltean 提交于 1月 29, 2021

The sja1105 implementation can be blind about this, but the felix driver
doesn't do exactly what it's being told, so it needs to know whether it
is a TX or an RX VLAN, so it can install the appropriate type of TCAM
rule.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9c7caf28

net: proc: speedup /proc/net/netstat · 0d6cd689

由 Eric Dumazet 提交于 1月 28, 2021

Use cache friendly helpers to better use cpu caches
while reading /proc/net/netstat

Tested on a platform with 256 threads (AMD Rome)

Before: 305 usec spent in netstat_seq_show()
After: 130 usec spent in netstat_seq_show()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210128162145.1703601-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0d6cd689

net: Remove redundant calls of sk_tx_queue_clear(). · df610cd9

由 Kuniyuki Iwashima 提交于 1月 29, 2021

The commit 41b14fb8 ("net: Do not clear the sock TX queue in
sk_set_socket()") removes sk_tx_queue_clear() from sk_set_socket() and adds
it instead in sk_alloc() and sk_clone_lock() to fix an issue introduced in
the commit e022f0b4 ("net: Introduce sk_tx_queue_mapping"). On the
other hand, the original commit had already put sk_tx_queue_clear() in
sk_prot_alloc(): the callee of sk_alloc() and sk_clone_lock(). Thus
sk_tx_queue_clear() is called twice in each path.

If we remove sk_tx_queue_clear() in sk_alloc() and sk_clone_lock(), it
currently works well because (i) sk_tx_queue_mapping is defined between
sk_dontcopy_begin and sk_dontcopy_end, and (ii) sock_copy() called after
sk_prot_alloc() in sk_clone_lock() does not overwrite sk_tx_queue_mapping.
However, if we move sk_tx_queue_mapping out of the no copy area, it
introduces a bug unintentionally.

Therefore, this patch adds a compile-time check to take care of the order
of sock_copy() and sk_tx_queue_clear() and removes sk_tx_queue_clear() from
sk_prot_alloc() so that it does the only allocation and its callers
initialize fields.

CC: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210128150217.6060-1-kuniyu@amazon.co.jpSigned-off-by: NJakub Kicinski <kuba@kernel.org>

df610cd9

ip_gre: add csum offload support for gre header · efa1a65c

由 Xin Long 提交于 1月 28, 2021

This patch is to add csum offload support for gre header:

On the TX path in gre_build_header(), when CHECKSUM_PARTIAL's set
for inner proto, it will calculate the csum for outer proto, and
inner csum will be offloaded later. Otherwise, CHECKSUM_PARTIAL
and csum_start/offset will be set for outer proto, and the outer
csum will be offloaded later.

On the GSO path in gre_gso_segment(), when CHECKSUM_PARTIAL is
not set for inner proto and the hardware supports csum offload,
CHECKSUM_PARTIAL and csum_start/offset will be set for outer
proto, and outer csum will be offloaded later. Otherwise, it
will do csum for outer proto by calling gso_make_checksum().

Note that SCTP has to do the csum by itself for non GSO path in
sctp_packet_pack(), as gre_build_header() can't handle the csum
with CHECKSUM_PARTIAL set for SCTP CRC csum offload.

v1->v2:
  - remove the SCTP part, as GRE dev doesn't support SCTP CRC CSUM
    and it will always do checksum for SCTP in sctp_packet_pack()
    when it's not a GSO packet.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

efa1a65c

net: support ip generic csum processing in skb_csum_hwoffload_help · 62fafcd6

由 Xin Long 提交于 1月 28, 2021

NETIF_F_IP|IPV6_CSUM feature flag indicates UDP and TCP csum offload
while NETIF_F_HW_CSUM feature flag indicates ip generic csum offload
for HW, which includes not only for TCP/UDP csum, but also for other
protocols' csum like GRE's.

However, in skb_csum_hwoffload_help() it only checks features against
NETIF_F_CSUM_MASK(NETIF_F_HW|IP|IPV6_CSUM). So if it's a non TCP/UDP
packet and the features doesn't support NETIF_F_HW_CSUM, but supports
NETIF_F_IP|IPV6_CSUM only, it would still return 0 and leave the HW
to do csum.

This patch is to support ip generic csum processing by checking
NETIF_F_HW_CSUM for all protocols, and check (NETIF_F_IP_CSUM |
NETIF_F_IPV6_CSUM) only for TCP and UDP.

Note that we're using skb->csum_offset to check if it's a TCP/UDP
proctol, this might be fragile. However, as Alex said, for now we
only have a few L4 protocols that are requesting Tx csum offload,
we'd better fix this until a new protocol comes with a same csum
offset.

v1->v2:
  - not extend skb->csum_not_inet, but use skb->csum_offset to tell
    if it's an UDP/TCP csum packet.
v2->v3:
  - add a note in the changelog, as Willem suggested.
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

62fafcd6

net: atm: pppoatm: use new API for wakeup tasklet · a5874597

由 Emil Renner Berthing 提交于 1月 27, 2021

This converts the driver to use the new tasklet API introduced in
commit 12cc923f ("tasklet: Introduce new initialization API")
Signed-off-by: NEmil Renner Berthing <kernel@esmil.dk>
Link: https://lore.kernel.org/r/20210127173256.13954-2-kernel@esmil.dkSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a5874597

net: atm: pppoatm: use tasklet_init to initialize wakeup tasklet · a5b88632

由 Emil Renner Berthing 提交于 1月 27, 2021

Previously a temporary tasklet structure was initialized on the stack
using DECLARE_TASKLET_OLD() and then copied over and modified. Nothing
else in the kernel seems to use this pattern, so let's just call
tasklet_init() like everyone else.
Signed-off-by: NEmil Renner Berthing <kernel@esmil.dk>
Link: https://lore.kernel.org/r/20210127173256.13954-1-kernel@esmil.dkSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a5b88632

net: flow_offload: Add original direction flag to ct_metadata · 941eff5a

由 Paul Blakey 提交于 1月 27, 2021

Give offloading drivers the direction of the offloaded ct flow,
this will be used for matches on direction (ct_state +/-rpl).
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Reviewed-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

941eff5a

net/sched: cls_flower: Add match on the ct_state reply flag · 8c85d18c

由 Paul Blakey 提交于 1月 27, 2021

Add match on the ct_state reply flag.

Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est+rpl \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \
  ct_state +trk+est-rpl \
  action mirred egress redirect dev ens1f0_0
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Reviewed-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8c85d18c

net: packet: make pkt_sk() inline · 8c224751

由 Menglong Dong 提交于 1月 27, 2021

It's better make 'pkt_sk()' inline here, as non-inline function
shouldn't occur in headers. Besides, this function is simple
enough to be inline.
Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20210127123302.29842-1-dong.menglong@zte.com.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>

8c224751

29 1月, 2021 19 次提交

nexthop: Extract a helper for validation of get/del RTNL requests · 0bccf8ed

由 Petr Machata 提交于 1月 28, 2021

Validation of messages for get / del of a next hop is the same as will be
validation of messages for get of a resilient next hop group bucket. The
difference is that policy for resilient next hop group buckets is a
superset of that used for next-hop get.

It is therefore possible to reuse the code that validates the nhmsg fields,
extracts the next-hop ID, and validates that. To that end, extract from
nh_valid_get_del_req() a helper __nh_valid_get_del_req() that does just
that.

Make the nlh argument const so that the function can be called from the
dump context, which only has a const nlh. Propagate the constness to
nh_valid_get_del_req().
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0bccf8ed

nexthop: Add a callback parameter to rtm_dump_walk_nexthops() · e948217d

由 Petr Machata 提交于 1月 28, 2021

In order to allow different handling for next-hop tree dumper and for
bucket dumper, parameterize the next-hop tree walker with a callback. Add
rtm_dump_nexthop_cb() with just the bits relevant for next-hop tree
dumping.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e948217d

nexthop: Extract a helper for walking the next-hop tree · cbee1807

由 Petr Machata 提交于 1月 28, 2021

Extract from rtm_dump_nexthop() a helper to walk the next hop tree. A
separate function for this will be reusable from the bucket dumper.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

cbee1807

nexthop: Strongly-type context of rtm_dump_nexthop() · a6fbbaa6

由 Petr Machata 提交于 1月 28, 2021

The dump operations need to keep state from one invocation to another. A
scratch area is dedicated for this purpose in the passed-in argument, cb,
namely via two aliased arrays, struct netlink_callback.args and .ctx.

Dumping of buckets will end up having to iterate over next hops as well,
and it would be nice to be able to reuse the iteration logic with the NH
dumper. The fact that the logic currently relies on fixed index to the
.args array, and the indices would have to be coordinated between the two
dumpers, makes this somewhat awkward.

To make the access patters clearer, introduce a helper struct with a NH
index, and instead of using the .args array directly, use it through this
structure.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a6fbbaa6

nexthop: Extract a common helper for parsing dump attributes · b9ebea12

由 Petr Machata 提交于 1月 28, 2021

Requests to dump nexthops have many attributes in common with those that
requests to dump buckets of resilient NH groups will have. However, they
have different policies. To allow reuse of this code, extract a
policy-agnostic wrapper out of nh_valid_dump_req(), and convert this
function into a thin wrapper around it.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b9ebea12

nexthop: Extract dump filtering parameters into a single structure · 56450ec6

由 Petr Machata 提交于 1月 28, 2021

Requests to dump nexthops have many attributes in common with those that
requests to dump buckets of resilient NH groups will have. In order to make
reuse of this code simpler, convert the code to use a single structure with
filtering configuration instead of passing around the parameters one by
one.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

56450ec6

nexthop: Dispatch notifier init()/fini() by group type · da230501

由 Petr Machata 提交于 1月 28, 2021

After there are several next-hop group types, initialization and
finalization of notifier type needs to reflect the actual type. Transform
nh_notifier_grp_info_init() and _fini() to make extending them easier.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

da230501

nexthop: Use enum to encode notification type · 09ad6bec

由 Ido Schimmel 提交于 1月 28, 2021

Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.

As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09ad6bec

nexthop: Assert the invariant that a NH group is of only one type · 720ccd9a

由 Petr Machata 提交于 1月 28, 2021

Most of the code that deals with nexthop groups relies on the fact that the
group is of exactly one well-known type. Currently there is only one type,
"mpath", but as more next-hop group types come, it becomes desirable to
have a central place where the setting is validated. Introduce such place
into nexthop_create_group(), such that the check is done before the code
that relies on that invariant is invoked.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

720ccd9a

nexthop: Introduce to struct nh_grp_entry a per-type union · b9bae61b

由 Petr Machata 提交于 1月 28, 2021

The values that a next-hop group needs to keep track of depend on the group
type. Introduce a union to separate fields specific to the mpath groups
from fields specific to other group types.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b9bae61b

nexthop: Dispatch nexthop_select_path() by group type · 79bc55e3

由 Petr Machata 提交于 1月 28, 2021

The logic for selecting path depends on the next-hop group type. Adapt the
nexthop_select_path() to dispatch according to the group type.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

79bc55e3

nexthop: Rename nexthop_free_mpath · 5d1f0f09

由 David Ahern 提交于 1月 28, 2021

nexthop_free_mpath really should be nexthop_free_group. Rename it.
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

5d1f0f09

net/af_iucv: build SG skbs for TRANS_HIPER sockets · 2c3b4456

由 Julian Wiedmann 提交于 1月 28, 2021

The TX path no longer falls apart when some of its SG skbs are later
linearized by lower layers of the stack. So enable the use of SG skbs
in iucv_sock_sendmsg() again.

This effectively reverts
commit dc5367bc ("net/af_iucv: don't use paged skbs for TX on HiperSockets").
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2c3b4456

net/af_iucv: don't track individual TX skbs for TRANS_HIPER sockets · 80bc97aa

由 Julian Wiedmann 提交于 1月 28, 2021

Stop maintaining the skb_send_q list for TRANS_HIPER sockets.

Not only is it extra overhead, but keeping around a list of skb clones
means that we later also have to match the ->sk_txnotify() calls
against these clones and free them accordingly.
The current matching logic (comparing the skbs' shinfo location) is
frustratingly fragile, and breaks if the skb's head is mangled in any
sort of way while passing from dev_queue_xmit() to the device's
HW queue.

Also adjust the interface for ->sk_txnotify(), to make clear that we
don't actually care about any skb internals.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

80bc97aa

net/af_iucv: count packets in the xmit path · ef6af7bd

由 Julian Wiedmann 提交于 1月 28, 2021

The TX code keeps track of all skbs that are in-flight but haven't
actually been sent out yet. For native IUCV sockets that's not a huge
deal, but with TRANS_HIPER sockets it would be much better if we
didn't need to maintain a list of skb clones.

Note that we actually only care about the _count_ of skbs in this stage
of the TX pipeline. So as prep work for removing the skb tracking on
TRANS_HIPER sockets, keep track of the skb count in a separate variable
and pair any list {enqueue, unlink} with a count {increment, decrement}.

Then replace all occurences where we currently look at the skb list's
fill level.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ef6af7bd

net/af_iucv: don't lookup the socket on TX notification · c464444f

由 Julian Wiedmann 提交于 1月 28, 2021

Whoever called iucv_sk(sk)->sk_txnotify() must already know that they're
dealing with an af_iucv socket.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c464444f

net/af_iucv: remove WARN_ONCE on malformed RX packets · 27e9c1de

由 Alexander Egorenkov 提交于 1月 28, 2021

syzbot reported the following finding:

AF_IUCV failed to receive skb, len=0
WARNING: CPU: 0 PID: 522 at net/iucv/af_iucv.c:2039 afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039
CPU: 0 PID: 522 Comm: syz-executor091 Not tainted 5.10.0-rc1-syzkaller-07082-g55027a88ec9f #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
 [<00000000b87ea538>] afiucv_hs_rcv+0x178/0x190 net/iucv/af_iucv.c:2039
([<00000000b87ea534>] afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039)
 [<00000000b796533e>] __netif_receive_skb_one_core+0x13e/0x188 net/core/dev.c:5315
 [<00000000b79653ce>] __netif_receive_skb+0x46/0x1c0 net/core/dev.c:5429
 [<00000000b79655fe>] netif_receive_skb_internal+0xb6/0x220 net/core/dev.c:5534
 [<00000000b796ac3a>] netif_receive_skb+0x42/0x318 net/core/dev.c:5593
 [<00000000b6fd45f4>] tun_rx_batched.isra.0+0x6fc/0x860 drivers/net/tun.c:1485
 [<00000000b6fddc4e>] tun_get_user+0x1c26/0x27f0 drivers/net/tun.c:1939
 [<00000000b6fe0f00>] tun_chr_write_iter+0x158/0x248 drivers/net/tun.c:1968
 [<00000000b4f22bfa>] call_write_iter include/linux/fs.h:1887 [inline]
 [<00000000b4f22bfa>] new_sync_write+0x442/0x648 fs/read_write.c:518
 [<00000000b4f238fe>] vfs_write.part.0+0x36e/0x5d8 fs/read_write.c:605
 [<00000000b4f2984e>] vfs_write+0x10e/0x148 fs/read_write.c:615
 [<00000000b4f29d0e>] ksys_write+0x166/0x290 fs/read_write.c:658
 [<00000000b8dc4ab4>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Last Breaking-Event-Address:
 [<00000000b8dc64d4>] __s390_indirect_jump_r14+0x0/0xc

Malformed RX packets shouldn't generate any warnings because
debugging info already flows to dropmon via the kfree_skb().
Signed-off-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

27e9c1de

rxrpc: Fix memory leak in rxrpc_lookup_local · b8323f72

由 Takeshi Misawa 提交于 1月 28, 2021

Commit 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Then release ref in __rxrpc_put_peer and rxrpc_put_peer_locked.

	struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
	-               peer->local = local;
	+               peer->local = rxrpc_get_local(local);

rxrpc_discard_prealloc also need ref release in discarding.

syzbot report:
BUG: memory leak
unreferenced object 0xffff8881080ddc00 (size 256):
  comm "syz-executor339", pid 8462, jiffies 4294942238 (age 12.350s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 0a 00 00 00 00 c0 00 08 81 88 ff ff  ................
  backtrace:
    [<000000002b6e495f>] kmalloc include/linux/slab.h:552 [inline]
    [<000000002b6e495f>] kzalloc include/linux/slab.h:682 [inline]
    [<000000002b6e495f>] rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline]
    [<000000002b6e495f>] rxrpc_lookup_local+0x1c1/0x760 net/rxrpc/local_object.c:244
    [<000000006b43a77b>] rxrpc_bind+0x174/0x240 net/rxrpc/af_rxrpc.c:149
    [<00000000fd447a55>] afs_open_socket+0xdb/0x200 fs/afs/rxrpc.c:64
    [<000000007fd8867c>] afs_net_init+0x2b4/0x340 fs/afs/main.c:126
    [<0000000063d80ec1>] ops_init+0x4e/0x190 net/core/net_namespace.c:152
    [<00000000073c5efa>] setup_net+0xde/0x2d0 net/core/net_namespace.c:342
    [<00000000a6744d5b>] copy_net_ns+0x19f/0x3e0 net/core/net_namespace.c:483
    [<0000000017d3aec3>] create_new_namespaces+0x199/0x4f0 kernel/nsproxy.c:110
    [<00000000186271ef>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226
    [<000000002de7bac4>] ksys_unshare+0x2fe/0x5c0 kernel/fork.c:2957
    [<00000000349b12ba>] __do_sys_unshare kernel/fork.c:3025 [inline]
    [<00000000349b12ba>] __se_sys_unshare kernel/fork.c:3023 [inline]
    [<00000000349b12ba>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3023
    [<000000006d178ef7>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    [<00000000637076d4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Signed-off-by: NTakeshi Misawa <jeliantsurux@gmail.com>
Reported-and-tested-by: syzbot+305326672fed51b205f7@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161183091692.3506637.3206605651502458810.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b8323f72

net: reduce indentation level in sk_clone_lock() · bbc20b70

由 Eric Dumazet 提交于 1月 27, 2021

Rework initial test to jump over init code
if memory allocation has failed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210127152731.748663-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

bbc20b70

28 1月, 2021 9 次提交

devlink: Add DMAC filter generic packet trap · e78ab164

由 Aya Levin 提交于 1月 26, 2021

Add packet trap that can report packets that were dropped due to
destination MAC filtering.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e78ab164

tipc: remove duplicated code in tipc_msg_create · 2a9063b7

由 Hoang Huu Le 提交于 1月 27, 2021

Remove a duplicate code checking for header size in tipc_msg_create() as
it's already being done in tipc_msg_init().
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20210127025123.6390-1-hoang.h.le@dektech.com.auSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2a9063b7

net: bridge: multicast: make tracked EHT hosts limit configurable · 2dba407f

由 Nikolay Aleksandrov 提交于 1月 26, 2021

Add two new port attributes which make EHT hosts limit configurable and
export the current number of tracked EHT hosts:
 - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit
 - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts
Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed.

Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've
increased it to 40 to have space for two more future entries.

v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c,
    no functional change
Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2dba407f

net: bridge: multicast: add per-port EHT hosts limit · 89268b05

由 Nikolay Aleksandrov 提交于 1月 26, 2021

Add a default limit of 512 for number of tracked EHT hosts per-port.
Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

89268b05

net: decnet: fix netdev refcount leaking on error path · 3f96d644

由 Vadim Fedorenko 提交于 1月 26, 2021

On building the route there is an assumption that the destination
could be local. In this case loopback_dev is used to get the address.
If the address is still cannot be retrieved dn_route_output_slow
returns EADDRNOTAVAIL with loopback_dev reference taken.

Cannot find hash for the fixes tag because this code was introduced
long time ago. I don't think that this bug has ever fired but the
patch is done just to have a consistent code base.
Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1611619334-20955-1-git-send-email-vfedorenko@novek.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>

3f96d644

net: remove redundant 'depends on NET' · 864e898b

由 Masahiro Yamada 提交于 1月 26, 2021

These Kconfig files are included from net/Kconfig, inside the
if NET ... endif.

Remove 'depends on NET', which we know it is already met.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210125232026.106855-1-masahiroy@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

864e898b

net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile · d32f834c

由 Masahiro Yamada 提交于 1月 26, 2021

CONFIG_NET_L3_MASTER_DEV is a bool option. Change the ifeq conditional
to the standard obj-$(CONFIG_NET_L3_MASTER_DEV) form.

Use obj-y in net/l3mdev/Makefile because Kbuild visits this Makefile
only when CONFIG_NET_L3_MASTER_DEV=y.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210125231659.106201-4-masahiroy@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d32f834c

net: switchdev: use obj-$(CONFIG_NET_SWITCHDEV) form in net/Makefile · 0cfd99b4

由 Masahiro Yamada 提交于 1月 26, 2021

CONFIG_NET_SWITCHDEV is a bool option. Change the ifeq conditional to
the standard obj-$(CONFIG_NET_SWITCHDEV) form.

Use obj-y in net/switchdev/Makefile because Kbuild visits this Makefile
only when CONFIG_NET_SWITCHDEV=y.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210125231659.106201-3-masahiroy@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0cfd99b4

net: dcb: use obj-$(CONFIG_DCB) form in net/Makefile · 1e328ed5

由 Masahiro Yamada 提交于 1月 26, 2021

CONFIG_DCB is a bool option. Change the ifeq conditional to the
standard obj-$(CONFIG_DCB) form.

Use obj-y in net/dcb/Makefile because Kbuild visits this Makefile
only when CONFIG_DCB=y.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210125231659.106201-2-masahiroy@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

1e328ed5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功