1. 27 8月, 2019 3 次提交
    • V
      net: sched: take reference to action dev before calling offloads · 5a6ff4b1
      Vlad Buslov 提交于
      In order to remove dependency on rtnl lock when calling hardware offload
      API, take reference to action mirred dev when initializing flow_action
      structure in tc_setup_flow_action(). Implement function
      tc_cleanup_flow_action(), use it to release the device after hardware
      offload API is done using it.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6ff4b1
    • V
      net: sched: take rtnl lock in tc_setup_flow_action() · 9838b20a
      Vlad Buslov 提交于
      In order to allow using new flow_action infrastructure from unlocked
      classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
      argument. Take rtnl lock before accessing tc_action data. This is necessary
      to protect from concurrent action replace.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9838b20a
    • V
      net: sched: refactor block offloads counter usage · 40119211
      Vlad Buslov 提交于
      Without rtnl lock protection filters can no longer safely manage block
      offloads counter themselves. Refactor cls API to protect block offloadcnt
      with tcf_block->cb_lock that is already used to protect driver callback
      list and nooffloaddevcnt counter. The counter can be modified by concurrent
      tasks by new functions that execute block callbacks (which is safe with
      previous patch that changed its type to atomic_t), however, block
      bind/unbind code that checks the counter value takes cb_lock in write mode
      to exclude any concurrent modifications. This approach prevents race
      conditions between bind/unbind and callback execution code but allows for
      concurrency for tc rule update path.
      
      Move block offload counter, filter in hardware counter and filter flags
      management from classifiers into cls hardware offloads API. Make functions
      tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
      private. Implement following new cls API to be used instead:
      
        tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
        already in hardware is successfully offloaded, increment block offloads
        counter, set filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be intact and offloads counter is not
        decremented.
      
        tc_setup_cb_replace() - destructive filter replace. Release existing
        filter block offload counter and reset its in hardware counter and flag.
        Set new filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be destroyed and offload counter is
        decremented.
      
        tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
        offloads counter.
      
        tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
        call tc_cls_offload_cnt_update() if cb() didn't return an error.
      
      Refactor all offload-capable classifiers to atomically offload filters to
      hardware, change block offload counter, and set filter in hardware counter
      and flag by means of the new cls API functions.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40119211
  2. 19 8月, 2019 1 次提交
  3. 09 8月, 2019 1 次提交
  4. 20 7月, 2019 1 次提交
  5. 13 7月, 2019 1 次提交
    • V
      net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd() · c1a970d0
      Vlad Buslov 提交于
      After recent refactoring of block offlads infrastructure, indr_dev->block
      pointer is dereferenced before it is verified to be non-NULL. Example stack
      trace where this behavior leads to NULL-pointer dereference error when
      creating vxlan dev on system with mlx5 NIC with offloads enabled:
      
      [ 1157.852938] ==================================================================
      [ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
      [ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
      [ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 1157.929031] Call Trace:
      [ 1157.938318]  dump_stack+0x9a/0xeb
      [ 1157.948362]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.960262]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.972082]  __kasan_report+0x176/0x192
      [ 1157.982513]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.994348]  kasan_report+0xe/0x20
      [ 1158.004324]  tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1158.015950]  ? tcf_block_setup+0x430/0x430
      [ 1158.026558]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.037464]  __tc_indr_block_cb_register+0x5f5/0xf20
      [ 1158.049288]  ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
      [ 1158.062344]  ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
      [ 1158.074498]  ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
      [ 1158.086580]  ? br_device_event+0x98/0x480 [bridge]
      [ 1158.097870]  ? strcmp+0x30/0x50
      [ 1158.107578]  mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
      [ 1158.120212]  notifier_call_chain+0x6d/0xa0
      [ 1158.130753]  register_netdevice+0x6fc/0x7e0
      [ 1158.141322]  ? netdev_change_features+0xa0/0xa0
      [ 1158.152218]  ? vxlan_config_apply+0x210/0x310 [vxlan]
      [ 1158.163593]  __vxlan_dev_create+0x2ad/0x520 [vxlan]
      [ 1158.174770]  ? vxlan_changelink+0x490/0x490 [vxlan]
      [ 1158.185870]  ? rcu_read_unlock+0x60/0x60 [vxlan]
      [ 1158.196798]  vxlan_newlink+0x99/0xf0 [vxlan]
      [ 1158.207303]  ? __vxlan_dev_create+0x520/0x520 [vxlan]
      [ 1158.218601]  ? rtnl_create_link+0x3d0/0x450
      [ 1158.228900]  __rtnl_newlink+0x8a7/0xb00
      [ 1158.238701]  ? stack_access_ok+0x35/0x80
      [ 1158.248450]  ? rtnl_link_unregister+0x1a0/0x1a0
      [ 1158.258735]  ? find_held_lock+0x6d/0xd0
      [ 1158.268379]  ? is_bpf_text_address+0x67/0xf0
      [ 1158.278330]  ? lock_acquire+0xc1/0x1f0
      [ 1158.287686]  ? is_bpf_text_address+0x5/0xf0
      [ 1158.297449]  ? is_bpf_text_address+0x86/0xf0
      [ 1158.307310]  ? kernel_text_address+0xec/0x100
      [ 1158.317155]  ? arch_stack_walk+0x92/0xe0
      [ 1158.326497]  ? __kernel_text_address+0xe/0x30
      [ 1158.336213]  ? unwind_get_return_address+0x2f/0x50
      [ 1158.346267]  ? create_prof_cpu_mask+0x20/0x20
      [ 1158.355936]  ? arch_stack_walk+0x92/0xe0
      [ 1158.365117]  ? stack_trace_save+0x8a/0xb0
      [ 1158.374272]  ? stack_trace_consume_entry+0x80/0x80
      [ 1158.384226]  ? match_held_lock+0x33/0x210
      [ 1158.393216]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.402593]  rtnl_newlink+0x53/0x80
      [ 1158.410925]  rtnetlink_rcv_msg+0x3a5/0x600
      [ 1158.419777]  ? validate_linkmsg+0x400/0x400
      [ 1158.428620]  ? find_held_lock+0x6d/0xd0
      [ 1158.437117]  ? match_held_lock+0x1b/0x210
      [ 1158.445760]  ? validate_linkmsg+0x400/0x400
      [ 1158.454642]  netlink_rcv_skb+0xc7/0x1f0
      [ 1158.463150]  ? netlink_ack+0x470/0x470
      [ 1158.471538]  ? netlink_deliver_tap+0x1f3/0x5a0
      [ 1158.480607]  netlink_unicast+0x2ae/0x350
      [ 1158.489099]  ? netlink_attachskb+0x340/0x340
      [ 1158.497935]  ? _copy_from_iter_full+0xde/0x3b0
      [ 1158.506945]  ? __virt_addr_valid+0xb6/0xf0
      [ 1158.515578]  ? __check_object_size+0x159/0x240
      [ 1158.524515]  netlink_sendmsg+0x4d3/0x630
      [ 1158.532879]  ? netlink_unicast+0x350/0x350
      [ 1158.541400]  ? netlink_unicast+0x350/0x350
      [ 1158.549805]  sock_sendmsg+0x94/0xa0
      [ 1158.557561]  ___sys_sendmsg+0x49d/0x570
      [ 1158.565625]  ? copy_msghdr_from_user+0x210/0x210
      [ 1158.574457]  ? __fput+0x1e2/0x330
      [ 1158.581948]  ? __kasan_slab_free+0x130/0x180
      [ 1158.590407]  ? kmem_cache_free+0xb6/0x2d0
      [ 1158.598574]  ? mark_lock+0xc7/0x790
      [ 1158.606177]  ? task_work_run+0xcf/0x100
      [ 1158.614165]  ? exit_to_usermode_loop+0x102/0x110
      [ 1158.622954]  ? __lock_acquire+0x963/0x1ee0
      [ 1158.631199]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.639777]  ? match_held_lock+0x1b/0x210
      [ 1158.647918]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.656501]  ? match_held_lock+0x1b/0x210
      [ 1158.664643]  ? __fget_light+0xa6/0xe0
      [ 1158.672423]  ? __sys_sendmsg+0xd2/0x150
      [ 1158.680334]  __sys_sendmsg+0xd2/0x150
      [ 1158.688063]  ? __ia32_sys_shutdown+0x30/0x30
      [ 1158.696435]  ? lock_downgrade+0x2e0/0x2e0
      [ 1158.704541]  ? mark_held_locks+0x1a/0x90
      [ 1158.712611]  ? mark_held_locks+0x1a/0x90
      [ 1158.720619]  ? do_syscall_64+0x1e/0x2c0
      [ 1158.728530]  do_syscall_64+0x78/0x2c0
      [ 1158.736254]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1158.745414] RIP: 0033:0x7f62d505cb87
      [ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
       48 89 f3 48
      [ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
      [ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
      [ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
      [ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
      [ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
      [ 1158.852873] ==================================================================
      
      Introduce new function tcf_block_non_null_shared() that verifies block
      pointer before dereferencing it to obtain index. Use the function in
      tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
      
      Fixes: 955bcb6e ("drivers: net: use flow block API")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a970d0
  6. 10 7月, 2019 6 次提交
  7. 29 6月, 2019 1 次提交
    • J
      net: sched: refactor reinsert action · 720f22fe
      John Hurley 提交于
      The TC_ACT_REINSERT return type was added as an in-kernel only option to
      allow a packet ingress or egress redirect. This is used to avoid
      unnecessary skb clones in situations where they are not required. If a TC
      hook returns this code then the packet is 'reinserted' and no skb consume
      is carried out as no clone took place.
      
      This return type is only used in act_mirred. Rather than have the reinsert
      called from the main datapath, call it directly in act_mirred. Instead of
      returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
      which tells the caller that the packet has been stolen by another process
      and that no consume call is required.
      
      Moving all redirect calls to the act_mirred code is in preparation for
      tracking recursion created by act_mirred.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      720f22fe
  8. 16 6月, 2019 1 次提交
  9. 08 5月, 2019 1 次提交
  10. 06 5月, 2019 4 次提交
  11. 26 2月, 2019 1 次提交
  12. 23 2月, 2019 1 次提交
  13. 13 2月, 2019 4 次提交
  14. 07 2月, 2019 5 次提交
  15. 15 12月, 2018 1 次提交
  16. 20 11月, 2018 3 次提交
  17. 15 11月, 2018 4 次提交
  18. 12 11月, 2018 1 次提交
    • J
      net: sched: register callbacks for indirect tc block binds · 7f76fa36
      John Hurley 提交于
      Currently drivers can register to receive TC block bind/unbind callbacks
      by implementing the setup_tc ndo in any of their given netdevs. However,
      drivers may also be interested in binds to higher level devices (e.g.
      tunnel drivers) to potentially offload filters applied to them.
      
      Introduce indirect block devs which allows drivers to register callbacks
      for block binds on other devices. The callback is triggered when the
      device is bound to a block, allowing the driver to register for rules
      applied to that block using already available functions.
      
      Freeing an indirect block callback will trigger an unbind event (if
      necessary) to direct the driver to remove any offloaded rules and unreg
      any block rule callbacks. It is the responsibility of the implementing
      driver to clean any registered indirect block callbacks before exiting,
      if the block it still active at such a time.
      
      Allow registering an indirect block dev callback for a device that is
      already bound to a block. In this case (if it is an ingress block),
      register and also trigger the callback meaning that any already installed
      rules can be replayed to the calling driver.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f76fa36