提交 · 05415bccbb09a4eb0a3d57e7edfea73d4ce15b74 · openeuler / Kernel

03 3月, 2022 4 次提交

net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill() · 05415bcc

由 Petr Machata 提交于 3月 02, 2022

Later patches add handlers for more HW-backed statistics. An extack will be
useful when communicating HW / driver errors to the client. Add the
arguments as appropriate.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05415bcc

net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests · 46efc97b

由 Petr Machata 提交于 3月 02, 2022

The filter_mask field of RTM_GETSTATS header determines which top-level
attributes should be included in the netlink response. This saves
processing time by only including the bits that the user cares about
instead of always dumping everything. This is doubly important for
HW-backed statistics that would typically require a trip to the device to
fetch the stats.

So far there was only one HW-backed stat suite per attribute. However,
IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
the following patches. It would therefore be advantageous to be able to
filter within that nest, and select just one or the other HW-backed
statistics suite.

Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
The scheme is as follows:

    - RTM_GETSTATS
	- struct if_stats_msg
	- attr nest IFLA_STATS_GET_FILTERS
	    - attr IFLA_STATS_LINK_OFFLOAD_XSTATS
		- u32 filter_mask

This scheme reuses the existing enumerators by nesting them in a dedicated
context attribute. This is covered by policies as usual, therefore a
gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
nest has filtering enabled, because for the SW counters the issue does not
seem to be that important.

rtnl_offload_xstats_get_size() and _fill() are extended to observe the
requested filters.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46efc97b

net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed · f6e0fb81

由 Petr Machata 提交于 3月 02, 2022

The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
attributes carry various special hardware statistics. The code that handles
this nest was written with the idea that all these statistics would be
exposed by the device driver of a physical netdevice.

In the following patches, a new attribute is added to the abovementioned
nest, which however can be defined for some soft netdevices. The NDO-based
approach to querying these does not work, because it is not the soft
netdevice driver that exposes these statistics, but an offloading NIC
driver that does so.

The current code does not scale well to this usage. Simply rewrite it back
to the pattern seen in other fill-like and get_size-like functions
elsewhere.

Extract to helpers the code that is concerned with handling specifically
NDO-backed statistics so that it can be easily reused should more such
statistics be added.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6e0fb81

net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_* · 6b524a1d

由 Petr Machata 提交于 3月 02, 2022

The currently used names rtnl_get_offload_stats() and
rtnl_get_offload_stats_size() do not clearly show the namespace. The former
function additionally seems to have been named this way in accordance with
the NDO name, as opposed to the naming used in the rtnetlink.c file (and
indeed elsewhere in the netlink handling code). As more and
differently-flavored attributes are introduced, a common clear prefix is
needed for all related functions.

Rename the functions to follow the rtnl_offload_xstats_* naming scheme.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b524a1d

17 2月, 2022 1 次提交

net: rtnetlink: rtnl_stats_get(): Emit an extack for unset filter_mask · 22b67d17

由 Petr Machata 提交于 2月 16, 2022

Both get and dump handlers for RTM_GETSTATS require that a filter_mask, a
mask of which attributes should be emitted in the netlink response, is
unset. rtnl_stats_dump() does include an extack in the bounce,
rtnl_stats_get() however does not. Fix the omission.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/01feb1f4bbd22a19f6629503c4f366aed6424567.1645020876.git.petrm@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

22b67d17

14 2月, 2022 1 次提交

net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e

由 Eric Dumazet 提交于 2月 11, 2022

syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.

[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu

write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
 attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
 dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
 __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
 rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
 qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 470502de ("net: sched: unlock rules update API")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5891cd5e

10 2月, 2022 1 次提交

net: make net->dev_unreg_count atomic · ede6c39c

由 Eric Dumazet 提交于 2月 09, 2022

Having to acquire rtnl from netdev_run_todo() for every dismantled
device is not desirable when/if rtnl is under stress.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ede6c39c

02 2月, 2022 1 次提交

rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() · c6f6f244

由 Eric Dumazet 提交于 1月 31, 2022

While looking at one unrelated syzbot bug, I found the replay logic
in __rtnl_newlink() to potentially trigger use-after-free.

It is better to clear master_dev and m_ops inside the loop,
in case we have to replay it.

Fixes: ba7d49b1 ("rtnetlink: provide api for getting and setting slave info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

c6f6f244

06 1月, 2022 1 次提交

gro: add ability to control gro max packet size · eac1b93c

由 Coco Li 提交于 1月 05, 2022

Eric Dumazet suggested to allow users to modify max GRO packet size.

We have seen GRO being disabled by users of appliances (such as
wifi access points) because of claimed bufferbloat issues,
or some work arounds in sch_cake, to split GRO/GSO packets.

Instead of disabling GRO completely, one can chose to limit
the maximum packet size of GRO packets, depending on their
latency constraints.

This patch adds a per device gro_max_size attribute
that can be changed with ip link command.

ip link set dev eth0 gro_max_size 16000
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NCoco Li <lixiaoyan@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eac1b93c

29 11月, 2021 1 次提交

net: Write lock dev_base_lock without disabling bottom halves. · fd888e85

由 Sebastian Andrzej Siewior 提交于 11月 26, 2021

The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.

All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.

Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: NPei Zhang <pezhang@redhat.com>
Reported-by: NLuis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd888e85

23 11月, 2021 1 次提交

net: remove .ndo_change_proto_down · 2106efda

由 Jakub Kicinski 提交于 11月 22, 2021

.ndo_change_proto_down was added seemingly to enable out-of-tree
implementations. Over 2.5yrs later we still have no real users
upstream. Hardwire the generic implementation for now, we can
revert once real users materialize. (rocker is a test vehicle,
not a user.)

We need to drop the optimization on the sysfs side, because
unlike ndos priv_flags will be changed at runtime, so we'd
need READ_ONCE/WRITE_ONCE everywhere..
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2106efda

22 11月, 2021 1 次提交

net: annotate accesses to dev->gso_max_segs · 6d872df3

由 Eric Dumazet 提交于 11月 19, 2021

dev->gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d872df3

24 10月, 2021 1 次提交

net: rtnetlink: use __dev_addr_set() · efd38f75

由 Jakub Kicinski 提交于 10月 22, 2021

Get it ready for constant netdev->dev_addr.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efd38f75

21 10月, 2021 1 次提交

net/core: Remove unused assignment operations and variable · 50af5969

由 luo penghao 提交于 10月 21, 2021

Although if_info_size is assigned, it has not been used. And the variable
should also be deleted.

The clang_analyzer complains as follows:

net/core/rtnetlink.c:3806: warning:

Although the value stored to 'if_info_size' is used in the enclosing
expression, the value is never actually read from 'if_info_size'.
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: Nluo penghao <luo.penghao@zte.com.cn>
Reviewed-by: NSimon Horman <horms@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50af5969

16 10月, 2021 1 次提交

net: make use of helper netif_is_bridge_master() · 254ec036

由 Kyungrok Chung 提交于 10月 16, 2021

Make use of netdev helper functions to improve code readability.
Replace 'dev->priv_flags & IFF_EBRIDGE' with netif_is_bridge_master(dev).
Signed-off-by: NKyungrok Chung <acadx0@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

254ec036

06 10月, 2021 1 次提交

rtnetlink: fix if_nlmsg_stats_size() under estimation · d3436799

由 Eric Dumazet 提交于 10月 05, 2021

rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure.

nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags);

But if_nlmsg_stats_size() never considered the needed storage.

This bug did not show up because alloc_skb(X) allocates skb with
extra tailroom, because of added alignments. This could very well
be changed in the future to have deterministic behavior.

Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Acked-by: NRoopa Prabhu <roopa@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3436799

19 9月, 2021 1 次提交

net: rtnetlink: convert rcu_assign_pointer to RCU_INIT_POINTER · 4fc29989

由 Yajun Deng 提交于 9月 18, 2021

It no need barrier when assigning a NULL value to an RCU protected
pointer. So use RCU_INIT_POINTER() instead for more fast.
Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fc29989

26 8月, 2021 1 次提交

rtnetlink: Return correct error on changing device netns · 96a6b93b

由 Andrey Ignatov 提交于 8月 25, 2021

Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.

Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.

Before:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: Invalid argument

After:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: File exists

The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.

For example, __rtnl_newlink() does this:

net/core/rtnetlink.c
    3270	char ifname[IFNAMSIZ];
     ...
    3286	if (tb[IFLA_IFNAME])
    3287		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
    3288	else
    3289		ifname[0] = '\0';
     ...
    3364	if (dev) {
     ...
    3394		return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
    3395	}

, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.

But the pat (pattern) in __dev_change_net_namespace() is used as:

net/core/dev.c
   11198	err = -EEXIST;
   11199	if (__dev_get_by_name(net, dev->name)) {
   11200		/* We get here if we can't use the current device name */
   11201		if (!pat)
   11202			goto out;
   11203		err = dev_get_valid_name(net, dev, pat);
   11204		if (err < 0)
   11205			goto out;
   11206	}

As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.

Fixes: d8a5ec67 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96a6b93b

11 8月, 2021 1 次提交

net: Support filtering interfaces on no master · d3432bf1

由 Lahav Schlesinger 提交于 8月 10, 2021

Currently there's support for filtering neighbours/links for interfaces
which have a specific master device (using the IFLA_MASTER/NDA_MASTER
attributes).

This patch adds support for filtering interfaces/neighbours dump for
interfaces that *don't* have a master.
Signed-off-by: NLahav Schlesinger <lschlesinger@drivenets.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210810090658.2778960-1-lschlesinger@drivenets.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d3432bf1

04 8月, 2021 1 次提交

net: add extack arg for link ops · 8679c31e

由 Rocco Yue 提交于 8月 03, 2021

Pass extack arg to validate_linkmsg and validate_link_af callbacks.
If a netlink attribute has a reject_message, use the extended ack
mechanism to carry the message back to user space.
Signed-off-by: NRocco Yue <rocco.yue@mediatek.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8679c31e

27 7月, 2021 1 次提交

net: netlink: add the case when nlh is NULL · f9b282b3

由 Yajun Deng 提交于 7月 27, 2021

Add the case when nlh is NULL in nlmsg_report(),
so that the caller doesn't need to deal with this case.
Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9b282b3

17 7月, 2021 1 次提交

rtnetlink: use nlmsg_notify() in rtnetlink_send() · cfdf0d9a

由 Yajun Deng 提交于 7月 15, 2021

The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement
but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them.
so use nlmsg_notify() instead. so that the caller wouldn't deal with
'if (err > 0' statement.

v2: use nlmsg_notify() will do well.
Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfdf0d9a

30 6月, 2021 2 次提交

net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} · 78ecc890

由 Vladimir Oltean 提交于 6月 29, 2021

"Static" is a loaded word, and probably not what the author meant when
the code was written.

In particular, this looks weird:
$ bridge fdb add dev swp0 00:01:02:03:04:05 local        # totally fine, but
$ bridge fdb add dev swp0 00:01:02:03:04:05 static
[ 2020.708298] swp0: FDB only supports static addresses  # hmm what?

By looking at the implementation which uses dev_uc_add/dev_uc_del it is
absolutely clear that only local addresses are supported, and the proper
Network Unreachability Detection state is being used for this purpose
(user space indeed sets NUD_PERMANENT when local addresses are meant).
So it is just the message that is wrong, fix it.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78ecc890

net: use netdev_info in ndo_dflt_fdb_{add,del} · 23ac0b42

由 Vladimir Oltean 提交于 6月 29, 2021

Use the more modern printk helper for network interfaces, which also
contains information about the associated struct device, and results in
overall shorter line lengths compared to printing an open-coded
dev->name.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23ac0b42

13 6月, 2021 3 次提交

wwan: add interface creation support · 88b71053

由 Johannes Berg 提交于 6月 12, 2021

Add support to create (and destroy) interfaces via a new
rtnetlink kind "wwan". The responsible driver has to use
the new wwan_register_ops() to make this possible.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: NLoic Poulain <loic.poulain@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88b71053

rtnetlink: add IFLA_PARENT_[DEV|DEV_BUS]_NAME · 00e77ed8

由 Johannes Berg 提交于 6月 12, 2021

In some cases, for example in the upcoming WWAN framework changes,
there's no natural "parent netdev", so sometimes dummy netdevs are
created or similar. IFLA_PARENT_DEV_NAME is a new attribute intended to
contain a device (sysfs, struct device) name that can be used instead
when creating a new netdev, if the rtnetlink family implements it.

As suggested by Parav Pandit, we also introduce IFLA_PARENT_DEV_BUS_NAME
attribute in order to uniquely identify a device on the system (with
bus/name pair).

ip-link(8) support for the generic parent device attributes will help
us avoid code duplication, so no other link type will require a custom
code to handle the parent name attribute. E.g. the WWAN interface
creation command will looks like this:

$ ip link add wwan0-1 parent-dev wwan0 type wwan channel-id 1

So, some future subsystem (or driver) FOO will have an interface
creation command that looks like this:

$ ip link add foo1-3 parent-dev foo1 type foo bar-id 3 baz-type Y

Below is an example of dumping link info of a random device with these
new attributes:

$ ip --details link show wlp0s20f3
  4: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
     state UP mode DORMANT group default qlen 1000
     ...
     parent_bus pci parent_dev 0000:00:14.3
Co-developed-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Co-developed-by: NLoic Poulain <loic.poulain@linaro.org>
Signed-off-by: NLoic Poulain <loic.poulain@linaro.org>
Suggested-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00e77ed8

rtnetlink: add alloc() method to rtnl_link_ops · 8c713dc9

由 Johannes Berg 提交于 6月 12, 2021

In order to make rtnetlink ops that can create different
kinds of devices, like what we want to add to the WWAN
framework, the priv_size and setup parameters aren't quite
sufficient. Make this easier to manage by allowing ops to
allocate their own netdev via an @alloc method that gets
the tb netlink data.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c713dc9

10 6月, 2021 1 次提交

rtnetlink: Fix regression in bridge VLAN configuration · d2e381c4

由 Ido Schimmel 提交于 6月 09, 2021

Cited commit started returning errors when notification info is not
filled by the bridge driver, resulting in the following regression:

 # ip link add name br1 type bridge vlan_filtering 1
 # bridge vlan add dev br1 vid 555 self pvid untagged
 RTNETLINK answers: Invalid argument

As long as the bridge driver does not fill notification info for the
bridge device itself, an empty notification should not be considered as
an error. This is explained in commit 59ccaaaa ("bridge: dont send
notification when skb->len == 0 in rtnl_bridge_notify").

Fix by removing the error and add a comment to avoid future bugs.

Fixes: a8db57c1 ("rtnetlink: Fix missing error code in rtnl_bridge_notify()")
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2e381c4

04 6月, 2021 2 次提交

rtnetlink: Fix missing error code in rtnl_bridge_notify() · a8db57c1

由 Jiapeng Chong 提交于 6月 02, 2021

The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.

Eliminate the follow smatch warning:

net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code
'err'.
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8db57c1

rtnetlink: Fix spelling mistakes · d467d0bc

由 Zheng Yongjun 提交于 6月 02, 2021

Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d467d0bc

11 5月, 2021 1 次提交

rtnetlink: avoid RCU read lock when holding RTNL · a100243d

由 Cong Wang 提交于 5月 08, 2021

When we call af_ops->set_link_af() we hold a RCU read lock
as we retrieve af_ops from the RCU protected list, but this
is unnecessary because we already hold RTNL lock, which is
the writer lock for protecting rtnl_af_ops, so it is safer
than RCU read lock. Similar for af_ops->validate_link_af().

This was not a problem until we begin to take mutex lock
down the path of ->set_link_af() in __ipv6_dev_mc_dec()
recently. We can just drop the RCU read lock there and
assert RTNL lock.

Reported-and-tested-by: syzbot+7d941e89dd48bcf42573@syzkaller.appspotmail.com
Fixes: 63ed8de4 ("mld: add mc_lock for protecting per-interface mld data")
Tested-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a100243d

09 4月, 2021 1 次提交

ipv6: report errors for iftoken via netlink extack · 3583a4e8

由 Stephen Hemminger 提交于 4月 07, 2021

Setting iftoken can fail for several different reasons but there
and there was no report to user as to the cause. Add netlink
extended errors to the processing of the request.

This requires adding additional argument through rtnl_af_ops
set_link_af callback.
Reported-by: NHongren Zheng <li@zenithal.me>
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3583a4e8

08 4月, 2021 2 次提交

net: remove the new_ifindex argument from dev_change_net_namespace · 0854fa82

由 Andrei Vagin 提交于 4月 06, 2021

Here is only one place where we want to specify new_ifindex. In all
other cases, callers pass 0 as new_ifindex. It looks reasonable to add a
low-level function with new_ifindex and to convert
dev_change_net_namespace to a static inline wrapper.

Fixes: eeb85a14 ("net: Allow to specify ifindex when device is moved to another namespace")
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0854fa82

net: introduce nla_policy for IFLA_NEW_IFINDEX · 7e4a5131

由 Andrei Vagin 提交于 4月 06, 2021

In this case, we don't need to check that new_ifindex is positive in
validate_linkmsg.

Fixes: eeb85a14 ("net: Allow to specify ifindex when device is moved to another namespace")
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e4a5131

06 4月, 2021 1 次提交

net: Allow to specify ifindex when device is moved to another namespace · eeb85a14

由 Andrei Vagin 提交于 4月 05, 2021

Currently, we can specify ifindex on link creation. This change allows
to specify ifindex when a device is moved to another network namespace.

Even now, a device ifindex can be changed if there is another device
with the same ifindex in the target namespace. So this change doesn't
introduce completely new behavior, it adds more control to the process.

CRIU users want to restore containers with pre-created network devices.
A user will provide network devices and instructions where they have to
be restored, then CRIU will restore network namespaces and move devices
into them. The problem is that devices have to be restored with the same
indexes that they have before C/R.

Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Suggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Reviewed-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb85a14

04 3月, 2021 1 次提交

rtnetlink: using dev_base_seq from target net · a9ecb0cb

由 zhang kai 提交于 3月 02, 2021

Signed-off-by: Nzhang kai <zhangkaiheb@126.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9ecb0cb

12 2月, 2021 1 次提交

net: fix dev_ifsioc_locked() race condition · 3b23a32a

由 Cong Wang 提交于 2月 11, 2021

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr->ifr_hwaddr.sa_data,
					dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf ("net: RCU locking for simple ioctl()")
Reported-by: N"Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b23a32a

28 1月, 2021 1 次提交

net: bridge: multicast: make tracked EHT hosts limit configurable · 2dba407f

由 Nikolay Aleksandrov 提交于 1月 26, 2021

Add two new port attributes which make EHT hosts limit configurable and
export the current number of tracked EHT hosts:
 - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit
 - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts
Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed.

Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've
increased it to 40 to have space for two more future entries.

v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c,
    no functional change
Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2dba407f

09 1月, 2021 2 次提交

net: make free_netdev() more lenient with unregistering devices · c269a24c

由 Jakub Kicinski 提交于 1月 06, 2021

There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev->needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c269a24c

docs: net: explain struct net_device lifetime · 2b446e65

由 Jakub Kicinski 提交于 1月 06, 2021

Explain the two basic flows of struct net_device's operation.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2b446e65

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功