1. 17 8月, 2022 2 次提交
  2. 10 6月, 2022 1 次提交
  3. 14 2月, 2022 1 次提交
    • E
      net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e
      Eric Dumazet 提交于
      syzbot found a data-race [1] which lead me to add __rcu
      annotations to netdev->qdisc, and proper accessors
      to get LOCKDEP support.
      
      [1]
      BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
      
      write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
       attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
       dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
       __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
       __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
       rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
       __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
       rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
       qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
       __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
       tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
       rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5891cd5e
  4. 20 1月, 2022 1 次提交
  5. 13 1月, 2022 1 次提交
    • M
      sch_api: Don't skip qdisc attach on ingress · de2d807b
      Maxim Mikityanskiy 提交于
      The attach callback of struct Qdisc_ops is used by only a few qdiscs:
      mq, mqprio and htb. qdisc_graft() contains the following logic
      (pseudocode):
      
          if (!qdisc->ops->attach) {
              if (ingress)
                  do ingress stuff;
              else
                  do egress stuff;
          }
          if (!ingress) {
              ...
              if (qdisc->ops->attach)
                  qdisc->ops->attach(qdisc);
          } else {
              ...
          }
      
      As we see, the attach callback is not called if the qdisc is being
      attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
      mqprio, since they contain a check that they are attached to TC_H_ROOT,
      and they can't be attached to TC_H_INGRESS anyway.
      
      However, the commit cited below added the attach callback to htb. It is
      needed for the hardware offload, but in the non-offload mode it
      simulates the "do egress stuff" part of the pseudocode above. The
      problem is that when htb is attached to ingress, neither "do ingress
      stuff" nor attach() is called. It results in an inconsistency, and the
      following message is printed to dmesg:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      This commit addresses the issue by running "do ingress stuff" in the
      ingress flow even in the attach callback is present, which is fine,
      because attach isn't going to be called afterwards.
      
      The bug was found by syzbot and reported by Eric.
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reported-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de2d807b
  6. 06 1月, 2022 1 次提交
  7. 18 10月, 2021 2 次提交
    • A
      net: sched: Remove Qdisc::running sequence counter · 29cbcd85
      Ahmed S. Darwish 提交于
      The Qdisc::running sequence counter has two uses:
      
        1. Reliably reading qdisc's tc statistics while the qdisc is running
           (a seqcount read/retry loop at gnet_stats_add_basic()).
      
        2. As a flag, indicating whether the qdisc in question is running
           (without any retry loops).
      
      For the first usage, the Qdisc::running sequence counter write section,
      qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
      is actually needed: the raw qdisc's bstats update. A u64_stats sync
      point was thus introduced (in previous commits) inside the bstats
      structure itself. A local u64_stats write section is then started and
      stopped for the bstats updates.
      
      Use that u64_stats sync point mechanism for the bstats read/retry loop
      at gnet_stats_add_basic().
      
      For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
      accessed with atomic bitops, is sufficient. Using a bit flag instead of
      a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
      to the SMP barriers implicitly added through raw_read_seqcount() and
      write_seqcount_begin/end() getting removed. All call sites have been
      surveyed though, and no required ordering was identified.
      
      Now that the qdisc->running sequence counter is no longer used, remove
      it.
      
      Note, using u64_stats implies no sequence counter protection for 64-bit
      architectures. This can lead to the qdisc tc statistics "packets" vs.
      "bytes" values getting out of sync on rare occasions. The individual
      values will still be valid.
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29cbcd85
    • A
      net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types · 50dc9a85
      Ahmed S. Darwish 提交于
      The only factor differentiating per-CPU bstats data type (struct
      gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
      gnet_stats_basic_packed) was a u64_stats sync point inside the former.
      The two data types are now equivalent: earlier commits added a u64_stats
      sync point to the latter.
      
      Combine both data types into "struct gnet_stats_basic_sync". This
      eliminates redundancy and simplifies the bstats read/write APIs.
      
      Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
      architectures, u64_stats sync points do not use sequence counter
      protection.
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50dc9a85
  8. 30 9月, 2021 1 次提交
  9. 26 9月, 2021 1 次提交
    • net: prevent user from passing illegal stab size · b193e15a
      王贇 提交于
      We observed below report when playing with netlink sock:
      
        UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
        shift exponent 249 is too large for 32-bit type
        CPU: 0 PID: 685 Comm: a.out Not tainted
        Call Trace:
         dump_stack_lvl+0x8d/0xcf
         ubsan_epilogue+0xa/0x4e
         __ubsan_handle_shift_out_of_bounds+0x161/0x182
         __qdisc_calculate_pkt_len+0xf0/0x190
         __dev_queue_xmit+0x2ed/0x15b0
      
      it seems like kernel won't check the stab log value passing from
      user, and will use the insane value later to calculate pkt_len.
      
      This patch just add a check on the size/cell_log to avoid insane
      calculation.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b193e15a
  10. 17 7月, 2021 1 次提交
  11. 05 3月, 2021 1 次提交
  12. 23 1月, 2021 1 次提交
  13. 16 1月, 2021 1 次提交
    • E
      net_sched: reject silly cell_log in qdisc_get_rtab() · e4bedf48
      Eric Dumazet 提交于
      iproute2 probably never goes beyond 8 for the cell exponent,
      but stick to the max shift exponent for signed 32bit.
      
      UBSAN reported:
      UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
      shift exponent 130 is too large for 32-bit type 'int'
      CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x183/0x22e lib/dump_stack.c:120
       ubsan_epilogue lib/ubsan.c:148 [inline]
       __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
       __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
       qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
       cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
       qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
       tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
       rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg net/socket.c:672 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
       ___sys_sendmsg net/socket.c:2399 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2432
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e4bedf48
  14. 02 12月, 2020 1 次提交
  15. 17 11月, 2020 2 次提交
  16. 21 7月, 2020 1 次提交
    • J
      sched: sch_api: add missing rcu read lock to silence the warning · a8b7b2d0
      Jiri Pirko 提交于
      In case the qdisc_match_from_root function() is called from non-rcu path
      with rtnl mutex held, a suspiciout rcu usage warning appears:
      
      [  241.504354] =============================
      [  241.504358] WARNING: suspicious RCU usage
      [  241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted
      [  241.504370] -----------------------------
      [  241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!!
      [  241.504382]
                     other info that might help us debug this:
      [  241.504388]
                     rcu_scheduler_active = 2, debug_locks = 1
      [  241.504394] 1 lock held by tc/1391:
      [  241.504398]  #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      [  241.504431]
                     stack backtrace:
      [  241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32
      [  241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      [  241.504453] Call Trace:
      [  241.504465]  dump_stack+0x100/0x184
      [  241.504482]  lockdep_rcu_suspicious+0x153/0x15d
      [  241.504499]  qdisc_match_from_root+0x293/0x350
      
      Fix this by passing the rtnl held lockdep condition down to
      hlist_for_each_entry_rcu()
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b7b2d0
  17. 21 6月, 2020 1 次提交
  18. 28 5月, 2020 1 次提交
    • C
      net_sched: add a tracepoint for qdisc creation · f5a7833e
      Cong Wang 提交于
      With this tracepoint, we could know when qdisc's are created,
      especially those default qdisc's.
      
      Sample output:
      
        tc-736   [001] ...1    56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0
        tc-736   [001] ...1    56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff
        tc-738   [001] ...1    56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100
        tc-739   [001] ...1    56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200
        tc-740   [001] ...1    56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100
        tc-741   [001] ...1    56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200
        tc-745   [000] .N.1   111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a7833e
  19. 18 3月, 2020 2 次提交
  20. 27 1月, 2020 2 次提交
    • C
      net_sched: walk through all child classes in tc_bind_tclass() · 760d228e
      Cong Wang 提交于
      In a complex TC class hierarchy like this:
      
      tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         \
        avpkt 1000 cell 8
      tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  \
        rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      \
        avpkt 1000 bounded
      
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 80 0xffff flowid 1:3
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 25 0xffff flowid 1:4
      
      tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  \
        rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  \
        rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      
      where filters are installed on qdisc 1:0, so we can't merely
      search from class 1:1 when creating class 1:3 and class 1:4. We have
      to walk through all the child classes of the direct parent qdisc.
      Otherwise we would miss filters those need reverse binding.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      760d228e
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  21. 09 10月, 2019 1 次提交
  22. 22 9月, 2019 1 次提交
  23. 11 9月, 2019 1 次提交
  24. 31 5月, 2019 1 次提交
  25. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  26. 11 4月, 2019 1 次提交
    • P
      net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too · 8a53e616
      Paolo Abeni 提交于
      Since stats updating is always consistent with TCQ_F_CPUSTATS flag,
      we can disable it at qdisc creation time flipping such bit.
      
      In my experiments, if the NOLOCK flag is cleared, per CPU stats
      accounting does not give any measurable performance gain, but it
      waste some memory.
      
      Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving
      a NOLOCK qdisc to 'lock' one.
      
      Use stats update helper inside pfifo_fast, to cope correctly with
      TCQ_F_CPUSTATS flag change.
      
      As a side effect, q.qlen value for any child qdiscs is always
      consistent for all lock classfull qdiscs.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a53e616
  27. 14 3月, 2019 1 次提交
  28. 19 2月, 2019 1 次提交
  29. 18 2月, 2019 1 次提交
  30. 13 2月, 2019 3 次提交
  31. 20 1月, 2019 1 次提交
  32. 16 12月, 2018 1 次提交