1. 20 6月, 2020 1 次提交
  2. 16 5月, 2020 1 次提交
  3. 31 3月, 2020 1 次提交
    • J
      net: sched: expose HW stats types per action used by drivers · 93a129eb
      Jiri Pirko 提交于
      It may be up to the driver (in case ANY HW stats is passed) to select
      which type of HW stats he is going to use. Add an infrastructure to
      expose this information to user.
      
      $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
      $ tc -s filter show dev enp3s0np1 ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 192.168.1.1
        in_hw in_hw_count 2
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 10 sec used 10 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a129eb
  4. 27 3月, 2020 1 次提交
  5. 15 3月, 2020 1 次提交
    • P
      net: sched: RED: Introduce an ECN nodrop mode · 0a7fad23
      Petr Machata 提交于
      When the RED Qdisc is currently configured to enable ECN, the RED algorithm
      is used to decide whether a certain SKB should be marked. If that SKB is
      not ECN-capable, it is early-dropped.
      
      It is also possible to keep all traffic in the queue, and just mark the
      ECN-capable subset of it, as appropriate under the RED algorithm. Some
      switches support this mode, and some installations make use of it.
      
      To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
      configured with this flag, non-ECT traffic is enqueued instead of being
      early-dropped.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a7fad23
  6. 06 3月, 2020 1 次提交
  7. 20 2月, 2020 2 次提交
  8. 18 2月, 2020 1 次提交
  9. 27 1月, 2020 1 次提交
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  10. 25 1月, 2020 1 次提交
  11. 19 12月, 2019 2 次提交
  12. 27 8月, 2019 3 次提交
    • V
      net: sched: take reference to action dev before calling offloads · 5a6ff4b1
      Vlad Buslov 提交于
      In order to remove dependency on rtnl lock when calling hardware offload
      API, take reference to action mirred dev when initializing flow_action
      structure in tc_setup_flow_action(). Implement function
      tc_cleanup_flow_action(), use it to release the device after hardware
      offload API is done using it.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6ff4b1
    • V
      net: sched: take rtnl lock in tc_setup_flow_action() · 9838b20a
      Vlad Buslov 提交于
      In order to allow using new flow_action infrastructure from unlocked
      classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
      argument. Take rtnl lock before accessing tc_action data. This is necessary
      to protect from concurrent action replace.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9838b20a
    • V
      net: sched: refactor block offloads counter usage · 40119211
      Vlad Buslov 提交于
      Without rtnl lock protection filters can no longer safely manage block
      offloads counter themselves. Refactor cls API to protect block offloadcnt
      with tcf_block->cb_lock that is already used to protect driver callback
      list and nooffloaddevcnt counter. The counter can be modified by concurrent
      tasks by new functions that execute block callbacks (which is safe with
      previous patch that changed its type to atomic_t), however, block
      bind/unbind code that checks the counter value takes cb_lock in write mode
      to exclude any concurrent modifications. This approach prevents race
      conditions between bind/unbind and callback execution code but allows for
      concurrency for tc rule update path.
      
      Move block offload counter, filter in hardware counter and filter flags
      management from classifiers into cls hardware offloads API. Make functions
      tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
      private. Implement following new cls API to be used instead:
      
        tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
        already in hardware is successfully offloaded, increment block offloads
        counter, set filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be intact and offloads counter is not
        decremented.
      
        tc_setup_cb_replace() - destructive filter replace. Release existing
        filter block offload counter and reset its in hardware counter and flag.
        Set new filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be destroyed and offload counter is
        decremented.
      
        tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
        offloads counter.
      
        tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
        call tc_cls_offload_cnt_update() if cb() didn't return an error.
      
      Refactor all offload-capable classifiers to atomically offload filters to
      hardware, change block offload counter, and set filter in hardware counter
      and flag by means of the new cls API functions.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40119211
  13. 19 8月, 2019 1 次提交
  14. 09 8月, 2019 1 次提交
  15. 20 7月, 2019 1 次提交
  16. 13 7月, 2019 1 次提交
    • V
      net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd() · c1a970d0
      Vlad Buslov 提交于
      After recent refactoring of block offlads infrastructure, indr_dev->block
      pointer is dereferenced before it is verified to be non-NULL. Example stack
      trace where this behavior leads to NULL-pointer dereference error when
      creating vxlan dev on system with mlx5 NIC with offloads enabled:
      
      [ 1157.852938] ==================================================================
      [ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
      [ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
      [ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 1157.929031] Call Trace:
      [ 1157.938318]  dump_stack+0x9a/0xeb
      [ 1157.948362]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.960262]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.972082]  __kasan_report+0x176/0x192
      [ 1157.982513]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.994348]  kasan_report+0xe/0x20
      [ 1158.004324]  tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1158.015950]  ? tcf_block_setup+0x430/0x430
      [ 1158.026558]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.037464]  __tc_indr_block_cb_register+0x5f5/0xf20
      [ 1158.049288]  ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
      [ 1158.062344]  ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
      [ 1158.074498]  ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
      [ 1158.086580]  ? br_device_event+0x98/0x480 [bridge]
      [ 1158.097870]  ? strcmp+0x30/0x50
      [ 1158.107578]  mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
      [ 1158.120212]  notifier_call_chain+0x6d/0xa0
      [ 1158.130753]  register_netdevice+0x6fc/0x7e0
      [ 1158.141322]  ? netdev_change_features+0xa0/0xa0
      [ 1158.152218]  ? vxlan_config_apply+0x210/0x310 [vxlan]
      [ 1158.163593]  __vxlan_dev_create+0x2ad/0x520 [vxlan]
      [ 1158.174770]  ? vxlan_changelink+0x490/0x490 [vxlan]
      [ 1158.185870]  ? rcu_read_unlock+0x60/0x60 [vxlan]
      [ 1158.196798]  vxlan_newlink+0x99/0xf0 [vxlan]
      [ 1158.207303]  ? __vxlan_dev_create+0x520/0x520 [vxlan]
      [ 1158.218601]  ? rtnl_create_link+0x3d0/0x450
      [ 1158.228900]  __rtnl_newlink+0x8a7/0xb00
      [ 1158.238701]  ? stack_access_ok+0x35/0x80
      [ 1158.248450]  ? rtnl_link_unregister+0x1a0/0x1a0
      [ 1158.258735]  ? find_held_lock+0x6d/0xd0
      [ 1158.268379]  ? is_bpf_text_address+0x67/0xf0
      [ 1158.278330]  ? lock_acquire+0xc1/0x1f0
      [ 1158.287686]  ? is_bpf_text_address+0x5/0xf0
      [ 1158.297449]  ? is_bpf_text_address+0x86/0xf0
      [ 1158.307310]  ? kernel_text_address+0xec/0x100
      [ 1158.317155]  ? arch_stack_walk+0x92/0xe0
      [ 1158.326497]  ? __kernel_text_address+0xe/0x30
      [ 1158.336213]  ? unwind_get_return_address+0x2f/0x50
      [ 1158.346267]  ? create_prof_cpu_mask+0x20/0x20
      [ 1158.355936]  ? arch_stack_walk+0x92/0xe0
      [ 1158.365117]  ? stack_trace_save+0x8a/0xb0
      [ 1158.374272]  ? stack_trace_consume_entry+0x80/0x80
      [ 1158.384226]  ? match_held_lock+0x33/0x210
      [ 1158.393216]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.402593]  rtnl_newlink+0x53/0x80
      [ 1158.410925]  rtnetlink_rcv_msg+0x3a5/0x600
      [ 1158.419777]  ? validate_linkmsg+0x400/0x400
      [ 1158.428620]  ? find_held_lock+0x6d/0xd0
      [ 1158.437117]  ? match_held_lock+0x1b/0x210
      [ 1158.445760]  ? validate_linkmsg+0x400/0x400
      [ 1158.454642]  netlink_rcv_skb+0xc7/0x1f0
      [ 1158.463150]  ? netlink_ack+0x470/0x470
      [ 1158.471538]  ? netlink_deliver_tap+0x1f3/0x5a0
      [ 1158.480607]  netlink_unicast+0x2ae/0x350
      [ 1158.489099]  ? netlink_attachskb+0x340/0x340
      [ 1158.497935]  ? _copy_from_iter_full+0xde/0x3b0
      [ 1158.506945]  ? __virt_addr_valid+0xb6/0xf0
      [ 1158.515578]  ? __check_object_size+0x159/0x240
      [ 1158.524515]  netlink_sendmsg+0x4d3/0x630
      [ 1158.532879]  ? netlink_unicast+0x350/0x350
      [ 1158.541400]  ? netlink_unicast+0x350/0x350
      [ 1158.549805]  sock_sendmsg+0x94/0xa0
      [ 1158.557561]  ___sys_sendmsg+0x49d/0x570
      [ 1158.565625]  ? copy_msghdr_from_user+0x210/0x210
      [ 1158.574457]  ? __fput+0x1e2/0x330
      [ 1158.581948]  ? __kasan_slab_free+0x130/0x180
      [ 1158.590407]  ? kmem_cache_free+0xb6/0x2d0
      [ 1158.598574]  ? mark_lock+0xc7/0x790
      [ 1158.606177]  ? task_work_run+0xcf/0x100
      [ 1158.614165]  ? exit_to_usermode_loop+0x102/0x110
      [ 1158.622954]  ? __lock_acquire+0x963/0x1ee0
      [ 1158.631199]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.639777]  ? match_held_lock+0x1b/0x210
      [ 1158.647918]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.656501]  ? match_held_lock+0x1b/0x210
      [ 1158.664643]  ? __fget_light+0xa6/0xe0
      [ 1158.672423]  ? __sys_sendmsg+0xd2/0x150
      [ 1158.680334]  __sys_sendmsg+0xd2/0x150
      [ 1158.688063]  ? __ia32_sys_shutdown+0x30/0x30
      [ 1158.696435]  ? lock_downgrade+0x2e0/0x2e0
      [ 1158.704541]  ? mark_held_locks+0x1a/0x90
      [ 1158.712611]  ? mark_held_locks+0x1a/0x90
      [ 1158.720619]  ? do_syscall_64+0x1e/0x2c0
      [ 1158.728530]  do_syscall_64+0x78/0x2c0
      [ 1158.736254]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1158.745414] RIP: 0033:0x7f62d505cb87
      [ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
       48 89 f3 48
      [ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
      [ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
      [ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
      [ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
      [ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
      [ 1158.852873] ==================================================================
      
      Introduce new function tcf_block_non_null_shared() that verifies block
      pointer before dereferencing it to obtain index. Use the function in
      tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
      
      Fixes: 955bcb6e ("drivers: net: use flow block API")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a970d0
  17. 10 7月, 2019 6 次提交
  18. 29 6月, 2019 1 次提交
    • J
      net: sched: refactor reinsert action · 720f22fe
      John Hurley 提交于
      The TC_ACT_REINSERT return type was added as an in-kernel only option to
      allow a packet ingress or egress redirect. This is used to avoid
      unnecessary skb clones in situations where they are not required. If a TC
      hook returns this code then the packet is 'reinserted' and no skb consume
      is carried out as no clone took place.
      
      This return type is only used in act_mirred. Rather than have the reinsert
      called from the main datapath, call it directly in act_mirred. Instead of
      returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
      which tells the caller that the packet has been stolen by another process
      and that no consume call is required.
      
      Moving all redirect calls to the act_mirred code is in preparation for
      tracking recursion created by act_mirred.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      720f22fe
  19. 16 6月, 2019 1 次提交
  20. 08 5月, 2019 1 次提交
  21. 06 5月, 2019 4 次提交
  22. 26 2月, 2019 1 次提交
  23. 23 2月, 2019 1 次提交
  24. 13 2月, 2019 4 次提交
  25. 07 2月, 2019 1 次提交