提交 · cfc111d5391dda87310e7a385a9e17f300a8a97c · openeuler / Kernel

17 8月, 2022 2 次提交

net: sched: delete unused input parameter in qdisc_create · cfc111d5

由 Zhengchao Shao 提交于 8月 15, 2022

The input parameter p is unused in qdisc_create. Delete it.
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220815061023.51318-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

cfc111d5

net: sched: remove the unused return value of unregister_qdisc · 52327d2e

由 Zhengchao Shao 提交于 8月 15, 2022

Return value of unregister_qdisc is unused, remove it.
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220815030417.271894-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

52327d2e

10 6月, 2022 1 次提交

net: rename reference+tracking helpers · d62607c3

由 Jakub Kicinski 提交于 6月 07, 2022

Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()    -> netdev_hold()
 dev_put_track()     -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d62607c3

14 2月, 2022 1 次提交

net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e

由 Eric Dumazet 提交于 2月 11, 2022

syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.

[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu

write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
 attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
 dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
 __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
 rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
 qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 470502de ("net: sched: unlock rules update API")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5891cd5e

20 1月, 2022 1 次提交

net: sched: Clarify error message when qdisc kind is unknown · 973bf8fd

由 Victor Nogueira 提交于 1月 18, 2022

When adding a tc rule with a qdisc kind that is not supported or not
compiled into the kernel, the kernel emits the following error: "Error:
Specified qdisc not found.". Found via tdc testing when ETS qdisc was not
compiled in and it was not obvious right away what the message meant
without looking at the kernel code.

Change the error message to be more explicit and say the qdisc kind is
unknown.
Signed-off-by: NVictor Nogueira <victor@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

973bf8fd

13 1月, 2022 1 次提交

sch_api: Don't skip qdisc attach on ingress · de2d807b

由 Maxim Mikityanskiy 提交于 1月 12, 2022

The attach callback of struct Qdisc_ops is used by only a few qdiscs:
mq, mqprio and htb. qdisc_graft() contains the following logic
(pseudocode):

    if (!qdisc->ops->attach) {
        if (ingress)
            do ingress stuff;
        else
            do egress stuff;
    }
    if (!ingress) {
        ...
        if (qdisc->ops->attach)
            qdisc->ops->attach(qdisc);
    } else {
        ...
    }

As we see, the attach callback is not called if the qdisc is being
attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
mqprio, since they contain a check that they are attached to TC_H_ROOT,
and they can't be attached to TC_H_INGRESS anyway.

However, the commit cited below added the attach callback to htb. It is
needed for the hardware offload, but in the non-offload mode it
simulates the "do egress stuff" part of the pseudocode above. The
problem is that when htb is attached to ingress, neither "do ingress
stuff" nor attach() is called. It results in an inconsistency, and the
following message is printed to dmesg:

unregister_netdevice: waiting for lo to become free. Usage count = 2

This commit addresses the issue by running "do ingress stuff" in the
ingress flow even in the attach callback is present, which is fine,
because attach isn't going to be called afterwards.

The bug was found by syzbot and reported by Eric.

Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reported-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de2d807b

06 1月, 2022 1 次提交

net/sched: add missing tracker information in qdisc_create() · 88248c35

由 Eric Dumazet 提交于 1月 04, 2022

qdisc_create() error path needs to use dev_put_track()
because qdisc_alloc() allocated the tracker.

Fixes: 606509f2 ("net/sched: add net device refcount tracker to struct Qdisc")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220104170439.3790052-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

88248c35

18 10月, 2021 2 次提交

net: sched: Remove Qdisc::running sequence counter · 29cbcd85

由 Ahmed S. Darwish 提交于 10月 16, 2021

The Qdisc::running sequence counter has two uses:

  1. Reliably reading qdisc's tc statistics while the qdisc is running
     (a seqcount read/retry loop at gnet_stats_add_basic()).

  2. As a flag, indicating whether the qdisc in question is running
     (without any retry loops).

For the first usage, the Qdisc::running sequence counter write section,
qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
is actually needed: the raw qdisc's bstats update. A u64_stats sync
point was thus introduced (in previous commits) inside the bstats
structure itself. A local u64_stats write section is then started and
stopped for the bstats updates.

Use that u64_stats sync point mechanism for the bstats read/retry loop
at gnet_stats_add_basic().

For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
accessed with atomic bitops, is sufficient. Using a bit flag instead of
a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
to the SMP barriers implicitly added through raw_read_seqcount() and
write_seqcount_begin/end() getting removed. All call sites have been
surveyed though, and no required ordering was identified.

Now that the qdisc->running sequence counter is no longer used, remove
it.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the qdisc tc statistics "packets" vs.
"bytes" values getting out of sync on rare occasions. The individual
values will still be valid.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29cbcd85

net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types · 50dc9a85

由 Ahmed S. Darwish 提交于 10月 16, 2021

The only factor differentiating per-CPU bstats data type (struct
gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
gnet_stats_basic_packed) was a u64_stats sync point inside the former.
The two data types are now equivalent: earlier commits added a u64_stats
sync point to the latter.

Combine both data types into "struct gnet_stats_basic_sync". This
eliminates redundancy and simplifies the bstats read/write APIs.

Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
architectures, u64_stats sync points do not use sequence counter
protection.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50dc9a85

30 9月, 2021 1 次提交

net_sched: Use struct_size() and flex_array_size() helpers · 69508d43

由 Gustavo A. R. Silva 提交于 9月 28, 2021

Make use of the struct_size() and flex_array_size() helpers instead of
an open-coded version, in order to avoid any potential type mistakes
or integer overflows that, in the worse scenario, could lead to heap
overflows.

Link: https://github.com/KSPP/linux/issues/160Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedorSigned-off-by: NJakub Kicinski <kuba@kernel.org>

69508d43

26 9月, 2021 1 次提交

王

net: prevent user from passing illegal stab size · b193e15a

由王贇提交于 9月 24, 2021

We observed below report when playing with netlink sock:

  UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
  shift exponent 249 is too large for 32-bit type
  CPU: 0 PID: 685 Comm: a.out Not tainted
  Call Trace:
   dump_stack_lvl+0x8d/0xcf
   ubsan_epilogue+0xa/0x4e
   __ubsan_handle_shift_out_of_bounds+0x161/0x182
   __qdisc_calculate_pkt_len+0xf0/0x190
   __dev_queue_xmit+0x2ed/0x15b0

it seems like kernel won't check the stab log value passing from
user, and will use the insane value later to calculate pkt_len.

This patch just add a check on the size/cell_log to avoid insane
calculation.
Reported-by: NAbaci <abaci@linux.alibaba.com>
Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b193e15a

17 7月, 2021 1 次提交

net/sched: Remove unnecessary if statement · f79a3bcb

由 Yajun Deng 提交于 7月 15, 2021

It has been deal with the 'if (err' statement in rtnetlink_send()
and rtnl_unicast(). so remove unnecessary if statement.

v2: use the raw name rtnetlink_send().
Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f79a3bcb

05 3月, 2021 1 次提交

net: sched: avoid duplicates in classes dump · bfc25605

由 Maximilian Heyne 提交于 3月 04, 2021

This is a follow up of commit ea327469 ("net: sched: avoid
duplicates in qdisc dump") which has fixed the issue only for the qdisc
dump.

The duplicate printing also occurs when dumping the classes via
  tc class show dev eth0

Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfc25605

23 1月, 2021 1 次提交

net: sched: Add extack to Qdisc_class_ops.delete · 4dd78a73

由 Maxim Mikityanskiy 提交于 1月 19, 2021

In a following commit, sch_htb will start using extack in the delete
class operation to pass hardware errors in offload mode. This commit
prepares for that by adding the extack parameter to this callback and
converting usage of the existing qdiscs.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4dd78a73

16 1月, 2021 1 次提交

net_sched: reject silly cell_log in qdisc_get_rtab() · e4bedf48

由 Eric Dumazet 提交于 1月 14, 2021

iproute2 probably never goes beyond 8 for the cell exponent,
but stick to the max shift exponent for signed 32bit.

UBSAN reported:
UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
shift exponent 130 is too large for 32-bit type 'int'
CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x183/0x22e lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:148 [inline]
 __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg net/socket.c:672 [inline]
 ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
 ___sys_sendmsg net/socket.c:2399 [inline]
 __sys_sendmsg+0x319/0x400 net/socket.c:2432
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Acked-by: NCong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e4bedf48

02 12月, 2020 1 次提交

net: sched: remove redundant 'rtnl_held' argument · 0fca55ed

由 Vlad Buslov 提交于 11月 27, 2020

Functions tfilter_notify_chain() and tcf_get_next_proto() are always called
with rtnl lock held in current implementation. Moreover, attempting to call
them without rtnl lock would cause a warning down the call chain in
function __tcf_get_next_proto() that requires the lock to be held by
callers. Remove the 'rtnl_held' argument in order to simplify the code and
make rtnl lock requirement explicit.
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20201127151205.23492-1-vladbu@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0fca55ed

17 11月, 2020 2 次提交

treewide: rename nla_strlcpy to nla_strscpy. · 872f6903

由 Francis Laniel 提交于 11月 15, 2020

Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: NFrancis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

872f6903

Modify return value of nla_strlcpy to match that of strscpy. · 9ca71874

由 Francis Laniel 提交于 11月 15, 2020

nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.

For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.

The callers of nla_strlcpy were modified to take into account this modification.
Signed-off-by: NFrancis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9ca71874

21 7月, 2020 1 次提交

sched: sch_api: add missing rcu read lock to silence the warning · a8b7b2d0

由 Jiri Pirko 提交于 7月 20, 2020

In case the qdisc_match_from_root function() is called from non-rcu path
with rtnl mutex held, a suspiciout rcu usage warning appears:

[  241.504354] =============================
[  241.504358] WARNING: suspicious RCU usage
[  241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted
[  241.504370] -----------------------------
[  241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!!
[  241.504382]
               other info that might help us debug this:
[  241.504388]
               rcu_scheduler_active = 2, debug_locks = 1
[  241.504394] 1 lock held by tc/1391:
[  241.504398]  #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
[  241.504431]
               stack backtrace:
[  241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32
[  241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  241.504453] Call Trace:
[  241.504465]  dump_stack+0x100/0x184
[  241.504482]  lockdep_rcu_suspicious+0x153/0x15d
[  241.504499]  qdisc_match_from_root+0x293/0x350

Fix this by passing the rtnl held lockdep condition down to
hlist_for_each_entry_rcu()
Reported-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b7b2d0

21 6月, 2020 1 次提交

Remove redundant condition in qdisc_graft · 8eaf8d99

由 Gaurav Singh 提交于 6月 18, 2020

parent cannot be NULL here since its in the else part
of the if (parent == NULL) condition. Remove the extra
check on parent pointer.
Signed-off-by: NGaurav Singh <gaurav1086@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8eaf8d99

28 5月, 2020 1 次提交

net_sched: add a tracepoint for qdisc creation · f5a7833e

由 Cong Wang 提交于 5月 26, 2020

With this tracepoint, we could know when qdisc's are created,
especially those default qdisc's.

Sample output:

tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0
tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff
tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100
tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200
tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100
tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200
tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a7833e

18 3月, 2020 2 次提交

net_sched: do not reprogram a timer about to expire · b88948fb

由 Eric Dumazet 提交于 3月 16, 2020

qdisc_watchdog_schedule_range_ns() can use the newly added slack
and avoid rearming the hrtimer a bit earlier than the current
value. This patch has no effect if delta_ns parameter
is zero.

Note that this means the max slack is potentially doubled.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b88948fb

net_sched: add qdisc_watchdog_schedule_range_ns() · efe074c2

由 Eric Dumazet 提交于 3月 16, 2020

Some packet schedulers might want to add a slack
when programming hrtimers. This can reduce number
of interrupts and increase batch sizes and thus
give good xmit_more savings.

This commit adds qdisc_watchdog_schedule_range_ns()
helper, with an extra delta_ns parameter.

Legacy qdisc_watchdog_schedule_n() becomes an inline
passing a zero slack.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efe074c2

27 1月, 2020 2 次提交

net_sched: walk through all child classes in tc_bind_tclass() · 760d228e

由 Cong Wang 提交于 1月 23, 2020

In a complex TC class hierarchy like this:

tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         \
  avpkt 1000 cell 8
tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  \
  rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      \
  avpkt 1000 bounded

tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
  sport 80 0xffff flowid 1:3
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
  sport 25 0xffff flowid 1:4

tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  \
  rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      \
  avpkt 1000
tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  \
  rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      \
  avpkt 1000

where filters are installed on qdisc 1:0, so we can't merely
search from class 1:1 when creating class 1:3 and class 1:4. We have
to walk through all the child classes of the direct parent qdisc.
Otherwise we would miss filters those need reverse binding.

Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

760d228e

net_sched: fix ops->bind_class() implementations · 2e24cd75

由 Cong Wang 提交于 1月 23, 2020

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e24cd75

09 10月, 2019 1 次提交

net_sched: fix backward compatibility for TCA_KIND · 6f96c3c6

由 Cong Wang 提交于 10月 07, 2019

Marcelo noticed a backward compatibility issue of TCA_KIND
after we move from NLA_STRING to NLA_NUL_STRING, so it is probably
too late to change it.

Instead, to make everyone happy, we can just insert a NUL to
terminate the string with nla_strlcpy() like we do for TC actions.

Fixes: 62794fc4 ("net_sched: add max len check for TCA_KIND")
Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

6f96c3c6

22 9月, 2019 1 次提交

net_sched: add max len check for TCA_KIND · 62794fc4

由 Cong Wang 提交于 9月 18, 2019

The TCA_KIND attribute is of NLA_STRING which does not check
the NUL char. KMSAN reported an uninit-value of TCA_KIND which
is likely caused by the lack of NUL.

Change it to NLA_NUL_STRING and add a max len too.

Fixes: 8b4c3cdd ("net: sched: Add policy validation for tc attributes")
Reported-and-tested-by: syzbot+618aacd49e8c8b8486bd@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

62794fc4

11 9月, 2019 1 次提交

net_sched: check cops->tcf_block in tc_bind_tclass() · 8b142a00

由 Cong Wang 提交于 9月 08, 2019

At least sch_red and sch_tbf don't implement ->tcf_block()
while still have a non-zero tc "class".

Instead of adding nop implementations to each of such qdisc's,
we can just relax the check of cops->tcf_block() in
tc_bind_tclass(). They don't support TC filter anyway.

Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b142a00

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2874c5fd

28 4月, 2019 2 次提交

netlink: make validation more configurable for future strictness · 8cb08174

由 Johannes Berg 提交于 4月 26, 2019

We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb08174

netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de

由 Michal Kubecek 提交于 4月 26, 2019

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae0be8de

11 4月, 2019 1 次提交

net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too · 8a53e616

由 Paolo Abeni 提交于 4月 10, 2019

Since stats updating is always consistent with TCQ_F_CPUSTATS flag,
we can disable it at qdisc creation time flipping such bit.

In my experiments, if the NOLOCK flag is cleared, per CPU stats
accounting does not give any measurable performance gain, but it
waste some memory.

Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving
a NOLOCK qdisc to 'lock' one.

Use stats update helper inside pfifo_fast, to cope correctly with
TCQ_F_CPUSTATS flag change.

As a side effect, q.qlen value for any child qdiscs is always
consistent for all lock classfull qdiscs.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a53e616

14 3月, 2019 1 次提交

net_sched: return correct value for *notify* functions · 5b5f99b1

由 Zhike Wang 提交于 3月 11, 2019

It is confusing to directly use return value of netlink_send()/
netlink_unicast() as the return value of *notify*, as it may be not
error at all.

Example: in tc_del_tfilter(), after calling tfilter_del_notify(), it will
goto errout if (err). However, the netlink_send()/netlink_unicast() will
return positive value even for successful case. So it may not call
tcf_chain_tp_remove() and so on to clean up the resource, as a result,
resource is leaked.

It may be easier to only check the return value of tfilter_del_nofiy(),
but it is more clean to correct all related functions.
Co-developed-by: NZengmo Gao <gaozengmo@jd.com>
Signed-off-by: NZhike Wang <wangzhike@jd.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b5f99b1

19 2月, 2019 1 次提交

net: sched: using kfree_rcu() to simplify the code · 6e07902f

由 Wei Yongjun 提交于 2月 16, 2019

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e07902f

18 2月, 2019 1 次提交

net: sched: sch_api: set an error msg when qdisc_alloc_handle() fails · aaeb1dea

由 Ivan Vecera 提交于 2月 15, 2019

This patch sets an error message in extack when the number of qdisc
handles exceeds the maximum. Also the error-code ENOSPC is more
appropriate than ENOMEM in this situation.
Signed-off-by: NIvan Vecera <ivecera@redhat.com>
Reported-by: NLi Shuang <shuali@redhat.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aaeb1dea

13 2月, 2019 3 次提交

net: sched: extend proto ops to support unlocked classifiers · 12db03b6

由 Vlad Buslov 提交于 2月 11, 2019

Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. Extend users of these function in cls
API to propagate rtnl lock status to them. This allows classifiers to
obtain rtnl lock when necessary and to pass rtnl lock status to extensions
and driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12db03b6

net: sched: traverse classifiers in chain with tcf_get_next_proto() · fe2923af

由 Vlad Buslov 提交于 2月 11, 2019

All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe2923af

net: sched: traverse chains in block with tcf_get_next_chain() · bbf73830

由 Vlad Buslov 提交于 2月 11, 2019

All users of block->chain_list rely on rtnl lock and assume that no new
chains are added when traversing the list. Use tcf_get_next_chain() to
traverse chain list without relying on rtnl mutex. This function iterates
over chains by taking reference to current iterator chain only and doesn't
assume external synchronization of chain list.

Don't take reference to all chains in block when flushing and use
tcf_get_next_chain() to safely iterate over chain list instead. Remove
tcf_block_put_all_chains() that is no longer used.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbf73830

20 1月, 2019 1 次提交

sch_api: Change signature of qdisc_tree_reduce_backlog() to use ints · 5f2939d9

由 Toke Høiland-Jørgensen 提交于 1月 09, 2019

There are now several places where qdisc_tree_reduce_backlog() is called
with a negative number of packets (to signal an increase in number of
packets in the queue). Rather than rely on overflow behaviour, change the
function signature to use signed integers to communicate this usage to
people reading the code.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f2939d9

16 12月, 2018 1 次提交

net: sched: simplify the qdisc_leaf code · 2561f972

由 Tonghao Zhang 提交于 12月 13, 2018

Except for returning, the var leaf is not
used in the qdisc_leaf(). For simplicity, remove it.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2561f972

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功