1. 28 1月, 2022 1 次提交
  2. 13 10月, 2021 1 次提交
    • E
      pkt_sched: sch_qfq: fix qfq_change_class() error path · 1e1894bb
      Eric Dumazet 提交于
      stable inclusion
      from stable-5.10.50
      commit 5c8e5feceaf3e269d7aef271e9a794d5dbc1dbf1
      bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5c8e5feceaf3e269d7aef271e9a794d5dbc1dbf1
      
      --------------------------------
      
      [ Upstream commit 0cd58e5c ]
      
      If qfq_change_class() is unable to allocate memory for qfq_aggregate,
      it frees the class that has been inserted in the class hash table,
      but does not unhash it.
      
      Defer the insertion after the problematic allocation.
      
      BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline]
      BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
      Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478
      
      CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
       __kasan_report mm/kasan/report.c:419 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
       hlist_add_head include/linux/list.h:884 [inline]
       qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
       qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665d9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
      RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000
      
      Allocated by task 31445:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:428 [inline]
       ____kasan_kmalloc mm/kasan/common.c:507 [inline]
       ____kasan_kmalloc mm/kasan/common.c:466 [inline]
       __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516
       kmalloc include/linux/slab.h:556 [inline]
       kzalloc include/linux/slab.h:686 [inline]
       qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 31445:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
       ____kasan_slab_free mm/kasan/common.c:360 [inline]
       ____kasan_slab_free mm/kasan/common.c:325 [inline]
       __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
       kasan_slab_free include/linux/kasan.h:212 [inline]
       slab_free_hook mm/slub.c:1583 [inline]
       slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608
       slab_free mm/slub.c:3168 [inline]
       kfree+0xe5/0x7f0 mm/slub.c:4212
       qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88814a534f00
       which belongs to the cache kmalloc-128 of size 128
      The buggy address is located 16 bytes inside of
       128-byte region [ffff88814a534f00, ffff88814a534f80)
      The buggy address belongs to the page:
      page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534
      flags: 0x57ff00000000200(slab|node=1|zone=2|lastcpupid=0x7ff)
      raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744
       prep_new_page mm/page_alloc.c:2358 [inline]
       get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
       alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
       alloc_slab_page mm/slub.c:1646 [inline]
       allocate_slab+0x2c5/0x4c0 mm/slub.c:1786
       new_slab mm/slub.c:1849 [inline]
       new_slab_objects mm/slub.c:2595 [inline]
       ___slab_alloc+0x4a1/0x810 mm/slub.c:2758
       __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798
       slab_alloc_node mm/slub.c:2880 [inline]
       slab_alloc mm/slub.c:2922 [inline]
       __kmalloc+0x315/0x330 mm/slub.c:4050
       kmalloc include/linux/slab.h:561 [inline]
       kzalloc include/linux/slab.h:686 [inline]
       __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
       mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421
       mpls_add_dev net/mpls/af_mpls.c:1472 [inline]
       mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121
       call_netdevice_notifiers_extack net/core/dev.c:2133 [inline]
       call_netdevice_notifiers net/core/dev.c:2147 [inline]
       register_netdevice+0x106b/0x1500 net/core/dev.c:10312
       veth_newlink+0x585/0xac0 drivers/net/veth.c:1547
       __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452
       rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1298 [inline]
       free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342
       free_unref_page_prepare mm/page_alloc.c:3250 [inline]
       free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298
       __vunmap+0x783/0xb60 mm/vmalloc.c:2566
       free_work+0x58/0x70 mm/vmalloc.c:80
       process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
       kthread+0x3b1/0x4a0 kernel/kthread.c:313
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      Memory state around the buggy address:
       ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
       ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: 462dbc91 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1e1894bb
  3. 17 7月, 2020 1 次提交
  4. 08 7月, 2020 1 次提交
  5. 30 6月, 2020 1 次提交
    • P
      net: sched: Pass root lock to Qdisc_ops.enqueue · aebe4426
      Petr Machata 提交于
      A following patch introduces qevents, points in qdisc algorithm where
      packet can be processed by user-defined filters. Should this processing
      lead to a situation where a new packet is to be enqueued on the same port,
      holding the root lock would lead to deadlocks. To solve the issue, qevent
      handler needs to unlock and relock the root lock when necessary.
      
      To that end, add the root lock argument to the qdisc op enqueue, and
      propagate throughout.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aebe4426
  6. 19 6月, 2019 1 次提交
  7. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  8. 02 4月, 2019 2 次提交
    • P
      net: sched: introduce and use qdisc tree flush/purge helpers · e5f0e8f8
      Paolo Abeni 提交于
      The same code to flush qdisc tree and purge the qdisc queue
      is duplicated in many places and in most cases it does not
      respect NOLOCK qdisc: the global backlog len is used and the
      per CPU values are ignored.
      
      This change addresses the above, factoring-out the relevant
      code and using the helpers introduced by the previous patch
      to fetch the correct backlog len.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5f0e8f8
    • P
      net: sched: introduce and use qstats read helpers · 5dd431b6
      Paolo Abeni 提交于
      Classful qdiscs can't access directly the child qdiscs backlog
      length: if such qdisc is NOLOCK, per CPU values should be
      accounted instead.
      
      Most qdiscs no not respect the above. As a result, qstats fetching
      for most classful qdisc is currently incorrect: if the child qdisc is
      NOLOCK, it always reports 0 len backlog.
      
      This change introduces a pair of helpers to safely fetch
      both backlog and qlen and use them in stats class dumping
      functions, fixing the above issue and cleaning a bit the code.
      
      DRR needs also to access the child qdisc queue length, so it
      needs custom handling.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dd431b6
  9. 16 1月, 2019 2 次提交
  10. 26 9月, 2018 1 次提交
  11. 22 12月, 2017 6 次提交
  12. 22 10月, 2017 1 次提交
  13. 17 10月, 2017 1 次提交
  14. 07 9月, 2017 1 次提交
  15. 26 8月, 2017 1 次提交
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce
  16. 17 8月, 2017 1 次提交
  17. 07 6月, 2017 1 次提交
  18. 18 5月, 2017 2 次提交
  19. 14 4月, 2017 1 次提交
  20. 13 3月, 2017 1 次提交
  21. 06 12月, 2016 1 次提交
    • E
      net_sched: gen_estimator: complete rewrite of rate estimators · 1c0d32fd
      Eric Dumazet 提交于
      1) Old code was hard to maintain, due to complex lock chains.
         (We probably will be able to remove some kfree_rcu() in callers)
      
      2) Using a single timer to update all estimators does not scale.
      
      3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
         is not supposed to work well)
      
      In this rewrite :
      
      - I removed the RB tree that had to be scanned in
        gen_estimator_active(). qdisc dumps should be much faster.
      
      - Each estimator has its own timer.
      
      - Estimations are maintained in net_rate_estimator structure,
        instead of dirtying the qdisc. Minor, but part of the simplification.
      
      - Reading the estimator uses RCU and a seqcount to provide proper
        support for 32bit kernels.
      
      - We reduce memory need when estimators are not used, since
        we store a pointer, instead of the bytes/packets counters.
      
      - xt_rateest_mt() no longer has to grab a spinlock.
        (In the future, xt_rateest_tg() could be switched to per cpu counters)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0d32fd
  22. 23 9月, 2016 1 次提交
  23. 26 6月, 2016 1 次提交
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  24. 09 6月, 2016 2 次提交
  25. 08 6月, 2016 1 次提交
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
  26. 01 3月, 2016 2 次提交
  27. 28 8月, 2015 1 次提交
    • D
      net: sched: consolidate tc_classify{,_compat} · 3b3ae880
      Daniel Borkmann 提交于
      For classifiers getting invoked via tc_classify(), we always need an
      extra function call into tc_classify_compat(), as both are being
      exported as symbols and tc_classify() itself doesn't do much except
      handling of reclassifications when tp->classify() returned with
      TC_ACT_RECLASSIFY.
      
      CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
      all others use tc_classify(). When tc actions are being configured
      out in the kernel, tc_classify() effectively does nothing besides
      delegating.
      
      We could spare this layer and consolidate both functions. pktgen on
      single CPU constantly pushing skbs directly into the netif_receive_skb()
      path with a dummy classifier on ingress qdisc attached, improves
      slightly from 22.3Mpps to 23.1Mpps.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b3ae880
  28. 16 7月, 2015 1 次提交
  29. 22 6月, 2015 1 次提交