1. 28 8月, 2019 1 次提交
    • C
      net_sched: fix a NULL pointer deref in ipt action · 981471bd
      Cong Wang 提交于
      The net pointer in struct xt_tgdtor_param is not explicitly
      initialized therefore is still NULL when dereferencing it.
      So we have to find a way to pass the correct net pointer to
      ipt_destroy_target().
      
      The best way I find is just saving the net pointer inside the per
      netns struct tcf_idrinfo, which could make this patch smaller.
      
      Fixes: 0c66dc1e ("netfilter: conntrack: register hooks in netns when needed by ruleset")
      Reported-and-tested-by: itugrok@yahoo.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      981471bd
  2. 09 8月, 2019 2 次提交
  3. 07 8月, 2019 1 次提交
  4. 06 8月, 2019 1 次提交
    • D
      net: sched: use temporary variable for actions indexes · 7be8ef2c
      Dmytro Linkin 提交于
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be8ef2c
  5. 30 7月, 2019 1 次提交
  6. 26 7月, 2019 1 次提交
  7. 22 7月, 2019 1 次提交
    • V
      net: sched: verify that q!=NULL before setting q->flags · 503d81d4
      Vlad Buslov 提交于
      In function int tc_new_tfilter() q pointer can be NULL when adding filter
      on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
      filter creation, following NULL pointer dereference happens in case parent
      block is shared:
      
      [  212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
      [  212.925445] #PF: supervisor write access in kernel mode
      [  212.925709] #PF: error_code(0x0002) - not-present page
      [  212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
      [  212.926302] Oops: 0002 [#1] SMP KASAN PTI
      [  212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G    B             5.2.0+ #512
      [  212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
      [  212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
      b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
      [  212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
      [  212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
      [  212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
      [  212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
      [  212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
      [  212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
      [  212.929999] FS:  00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
      [  212.930235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
      [  212.930588] Call Trace:
      [  212.930682]  ? tc_del_tfilter+0xa40/0xa40
      [  212.930811]  ? __lock_acquire+0x5b5/0x2460
      [  212.930948]  ? find_held_lock+0x85/0xa0
      [  212.931081]  ? tc_del_tfilter+0xa40/0xa40
      [  212.931201]  rtnetlink_rcv_msg+0x4ab/0x5f0
      [  212.931332]  ? rtnl_dellink+0x490/0x490
      [  212.931454]  ? lockdep_hardirqs_on+0x260/0x260
      [  212.931589]  ? netlink_deliver_tap+0xab/0x5a0
      [  212.931717]  ? match_held_lock+0x1b/0x240
      [  212.931844]  netlink_rcv_skb+0xd0/0x200
      [  212.931958]  ? rtnl_dellink+0x490/0x490
      [  212.932079]  ? netlink_ack+0x440/0x440
      [  212.932205]  ? netlink_deliver_tap+0x161/0x5a0
      [  212.932335]  ? lock_downgrade+0x360/0x360
      [  212.932457]  ? lock_acquire+0xe5/0x210
      [  212.932579]  netlink_unicast+0x296/0x350
      [  212.932705]  ? netlink_attachskb+0x390/0x390
      [  212.932834]  ? _copy_from_iter_full+0xe0/0x3a0
      [  212.932976]  netlink_sendmsg+0x394/0x600
      [  212.937998]  ? netlink_unicast+0x350/0x350
      [  212.943033]  ? move_addr_to_kernel.part.0+0x90/0x90
      [  212.948115]  ? netlink_unicast+0x350/0x350
      [  212.953185]  sock_sendmsg+0x96/0xa0
      [  212.958099]  ___sys_sendmsg+0x482/0x520
      [  212.962881]  ? match_held_lock+0x1b/0x240
      [  212.967618]  ? copy_msghdr_from_user+0x250/0x250
      [  212.972337]  ? lock_downgrade+0x360/0x360
      [  212.976973]  ? rwlock_bug.part.0+0x60/0x60
      [  212.981548]  ? __mod_node_page_state+0x1f/0xa0
      [  212.986060]  ? match_held_lock+0x1b/0x240
      [  212.990567]  ? find_held_lock+0x85/0xa0
      [  212.994989]  ? do_user_addr_fault+0x349/0x5b0
      [  212.999387]  ? lock_downgrade+0x360/0x360
      [  213.003713]  ? find_held_lock+0x85/0xa0
      [  213.007972]  ? __fget_light+0xa1/0xf0
      [  213.012143]  ? sockfd_lookup_light+0x91/0xb0
      [  213.016165]  __sys_sendmsg+0xba/0x130
      [  213.020040]  ? __sys_sendmsg_sock+0xb0/0xb0
      [  213.023870]  ? handle_mm_fault+0x337/0x470
      [  213.027592]  ? page_fault+0x8/0x30
      [  213.031316]  ? lockdep_hardirqs_off+0xbe/0x100
      [  213.034999]  ? mark_held_locks+0x24/0x90
      [  213.038671]  ? do_syscall_64+0x1e/0xe0
      [  213.042297]  do_syscall_64+0x74/0xe0
      [  213.045828]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  213.049354] RIP: 0033:0x7fe7c527c7b8
      [  213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
      0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
      [  213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
      [  213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
      [  213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
      [  213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
      [  213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
      [  213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
      [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
      lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
      _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
      r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
      [  213.112326] CR2: 0000000000000010
      [  213.117429] ---[ end trace adb58eb0a4ee6283 ]---
      
      Verify that q pointer is not NULL before setting the 'flags' field.
      
      Fixes: 3f05e688 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      503d81d4
  8. 20 7月, 2019 2 次提交
  9. 18 7月, 2019 2 次提交
    • C
      net_sched: unset TCQ_F_CAN_BYPASS when adding filters · 3f05e688
      Cong Wang 提交于
      For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
      notably fq_codel, it makes no sense to let packets bypass the TC
      filters we setup in any scenario, otherwise our packets steering
      policy could not be enforced.
      
      This can be reproduced easily with the following script:
      
       ip li add dev dummy0 type dummy
       ifconfig dummy0 up
       tc qd add dev dummy0 root fq_codel
       tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
       tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
       ping -I dummy0 192.168.112.1
      
      Without this patch, packets are sent directly to dummy0 without
      hitting any of the filters. With this patch, packets are redirected
      to loopback as expected.
      
      This fix is not perfect, it only unsets the flag but does not set it back
      because we have to save the information somewhere in the qdisc if we
      really want that. Note, both fq_codel and sfq clear this flag in their
      ->bind_tcf() but this is clearly not sufficient when we don't use any
      class ID.
      
      Fixes: 23624935 ("net_sched: TCQ_F_CAN_BYPASS generalization")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f05e688
    • Y
      net/sched: Make NET_ACT_CT depends on NF_NAT · f11fe1da
      YueHaibing 提交于
      If NF_NAT is m and NET_ACT_CT is y, build fails:
      
      net/sched/act_ct.o: In function `tcf_ct_act':
      act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add'
      act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation'
      act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info'
      act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding'
      act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet'
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: b57dc7c1 ("net/sched: Introduce action ct")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f11fe1da
  10. 17 7月, 2019 1 次提交
  11. 13 7月, 2019 1 次提交
    • V
      net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd() · c1a970d0
      Vlad Buslov 提交于
      After recent refactoring of block offlads infrastructure, indr_dev->block
      pointer is dereferenced before it is verified to be non-NULL. Example stack
      trace where this behavior leads to NULL-pointer dereference error when
      creating vxlan dev on system with mlx5 NIC with offloads enabled:
      
      [ 1157.852938] ==================================================================
      [ 1157.866877] BUG: KASAN: null-ptr-deref in tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.880877] Read of size 4 at addr 0000000000000090 by task ip/3829
      [ 1157.901637] CPU: 22 PID: 3829 Comm: ip Not tainted 5.2.0-rc6+ #488
      [ 1157.914438] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 1157.929031] Call Trace:
      [ 1157.938318]  dump_stack+0x9a/0xeb
      [ 1157.948362]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.960262]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.972082]  __kasan_report+0x176/0x192
      [ 1157.982513]  ? tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1157.994348]  kasan_report+0xe/0x20
      [ 1158.004324]  tc_indr_block_ing_cmd.isra.41+0x9c/0x160
      [ 1158.015950]  ? tcf_block_setup+0x430/0x430
      [ 1158.026558]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.037464]  __tc_indr_block_cb_register+0x5f5/0xf20
      [ 1158.049288]  ? mlx5e_rep_indr_tc_block_unbind+0xa0/0xa0 [mlx5_core]
      [ 1158.062344]  ? tc_indr_block_dev_put.part.47+0x5c0/0x5c0
      [ 1158.074498]  ? rdma_roce_rescan_device+0x20/0x20 [ib_core]
      [ 1158.086580]  ? br_device_event+0x98/0x480 [bridge]
      [ 1158.097870]  ? strcmp+0x30/0x50
      [ 1158.107578]  mlx5e_nic_rep_netdevice_event+0xdd/0x180 [mlx5_core]
      [ 1158.120212]  notifier_call_chain+0x6d/0xa0
      [ 1158.130753]  register_netdevice+0x6fc/0x7e0
      [ 1158.141322]  ? netdev_change_features+0xa0/0xa0
      [ 1158.152218]  ? vxlan_config_apply+0x210/0x310 [vxlan]
      [ 1158.163593]  __vxlan_dev_create+0x2ad/0x520 [vxlan]
      [ 1158.174770]  ? vxlan_changelink+0x490/0x490 [vxlan]
      [ 1158.185870]  ? rcu_read_unlock+0x60/0x60 [vxlan]
      [ 1158.196798]  vxlan_newlink+0x99/0xf0 [vxlan]
      [ 1158.207303]  ? __vxlan_dev_create+0x520/0x520 [vxlan]
      [ 1158.218601]  ? rtnl_create_link+0x3d0/0x450
      [ 1158.228900]  __rtnl_newlink+0x8a7/0xb00
      [ 1158.238701]  ? stack_access_ok+0x35/0x80
      [ 1158.248450]  ? rtnl_link_unregister+0x1a0/0x1a0
      [ 1158.258735]  ? find_held_lock+0x6d/0xd0
      [ 1158.268379]  ? is_bpf_text_address+0x67/0xf0
      [ 1158.278330]  ? lock_acquire+0xc1/0x1f0
      [ 1158.287686]  ? is_bpf_text_address+0x5/0xf0
      [ 1158.297449]  ? is_bpf_text_address+0x86/0xf0
      [ 1158.307310]  ? kernel_text_address+0xec/0x100
      [ 1158.317155]  ? arch_stack_walk+0x92/0xe0
      [ 1158.326497]  ? __kernel_text_address+0xe/0x30
      [ 1158.336213]  ? unwind_get_return_address+0x2f/0x50
      [ 1158.346267]  ? create_prof_cpu_mask+0x20/0x20
      [ 1158.355936]  ? arch_stack_walk+0x92/0xe0
      [ 1158.365117]  ? stack_trace_save+0x8a/0xb0
      [ 1158.374272]  ? stack_trace_consume_entry+0x80/0x80
      [ 1158.384226]  ? match_held_lock+0x33/0x210
      [ 1158.393216]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1158.402593]  rtnl_newlink+0x53/0x80
      [ 1158.410925]  rtnetlink_rcv_msg+0x3a5/0x600
      [ 1158.419777]  ? validate_linkmsg+0x400/0x400
      [ 1158.428620]  ? find_held_lock+0x6d/0xd0
      [ 1158.437117]  ? match_held_lock+0x1b/0x210
      [ 1158.445760]  ? validate_linkmsg+0x400/0x400
      [ 1158.454642]  netlink_rcv_skb+0xc7/0x1f0
      [ 1158.463150]  ? netlink_ack+0x470/0x470
      [ 1158.471538]  ? netlink_deliver_tap+0x1f3/0x5a0
      [ 1158.480607]  netlink_unicast+0x2ae/0x350
      [ 1158.489099]  ? netlink_attachskb+0x340/0x340
      [ 1158.497935]  ? _copy_from_iter_full+0xde/0x3b0
      [ 1158.506945]  ? __virt_addr_valid+0xb6/0xf0
      [ 1158.515578]  ? __check_object_size+0x159/0x240
      [ 1158.524515]  netlink_sendmsg+0x4d3/0x630
      [ 1158.532879]  ? netlink_unicast+0x350/0x350
      [ 1158.541400]  ? netlink_unicast+0x350/0x350
      [ 1158.549805]  sock_sendmsg+0x94/0xa0
      [ 1158.557561]  ___sys_sendmsg+0x49d/0x570
      [ 1158.565625]  ? copy_msghdr_from_user+0x210/0x210
      [ 1158.574457]  ? __fput+0x1e2/0x330
      [ 1158.581948]  ? __kasan_slab_free+0x130/0x180
      [ 1158.590407]  ? kmem_cache_free+0xb6/0x2d0
      [ 1158.598574]  ? mark_lock+0xc7/0x790
      [ 1158.606177]  ? task_work_run+0xcf/0x100
      [ 1158.614165]  ? exit_to_usermode_loop+0x102/0x110
      [ 1158.622954]  ? __lock_acquire+0x963/0x1ee0
      [ 1158.631199]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.639777]  ? match_held_lock+0x1b/0x210
      [ 1158.647918]  ? lockdep_hardirqs_on+0x260/0x260
      [ 1158.656501]  ? match_held_lock+0x1b/0x210
      [ 1158.664643]  ? __fget_light+0xa6/0xe0
      [ 1158.672423]  ? __sys_sendmsg+0xd2/0x150
      [ 1158.680334]  __sys_sendmsg+0xd2/0x150
      [ 1158.688063]  ? __ia32_sys_shutdown+0x30/0x30
      [ 1158.696435]  ? lock_downgrade+0x2e0/0x2e0
      [ 1158.704541]  ? mark_held_locks+0x1a/0x90
      [ 1158.712611]  ? mark_held_locks+0x1a/0x90
      [ 1158.720619]  ? do_syscall_64+0x1e/0x2c0
      [ 1158.728530]  do_syscall_64+0x78/0x2c0
      [ 1158.736254]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1158.745414] RIP: 0033:0x7f62d505cb87
      [ 1158.753070] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00[87/1817]
       48 89 f3 48
      [ 1158.780924] RSP: 002b:00007fffd9832268 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 1158.793204] RAX: ffffffffffffffda RBX: 000000005d26048f RCX: 00007f62d505cb87
      [ 1158.805111] RDX: 0000000000000000 RSI: 00007fffd98322d0 RDI: 0000000000000003
      [ 1158.817055] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
      [ 1158.828987] R10: 00007f62d50ce260 R11: 0000000000000246 R12: 0000000000000001
      [ 1158.840909] R13: 000000000067e540 R14: 0000000000000000 R15: 000000000067ed20
      [ 1158.852873] ==================================================================
      
      Introduce new function tcf_block_non_null_shared() that verifies block
      pointer before dereferencing it to obtain index. Use the function in
      tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
      
      Fixes: 955bcb6e ("drivers: net: use flow block API")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a970d0
  12. 10 7月, 2019 9 次提交
  13. 09 7月, 2019 1 次提交
  14. 02 7月, 2019 3 次提交
  15. 30 6月, 2019 4 次提交
  16. 29 6月, 2019 8 次提交
    • V
      taprio: Adjust timestamps for TCP packets · 54002066
      Vedang Patel 提交于
      When the taprio qdisc is running in "txtime offload" mode, it will
      set the launchtime value (in skb->tstamp) for all the packets which do
      not have the SO_TXTIME socket option. But, the TCP packets already have
      this value set and it indicates the earliest departure time represented
      in CLOCK_MONOTONIC clock.
      
      We need to respect the timestamp set by the TCP subsystem. So, convert
      this time to the clock which taprio is using and ensure that the packet
      is not transmitted before the deadline set by TCP.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54002066
    • V
      taprio: make clock reference conversions easier · 7ede7b03
      Vedang Patel 提交于
      Later in this series we will need to transform from
      CLOCK_MONOTONIC (used in TCP) to the clock reference used in TAPRIO.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ede7b03
    • V
      taprio: Add support for txtime-assist mode · 4cfd5779
      Vedang Patel 提交于
      Currently, we are seeing non-critical packets being transmitted outside of
      their timeslice. We can confirm that the packets are being dequeued at the
      right time. So, the delay is induced in the hardware side.  The most likely
      reason is the hardware queues are starving the lower priority queues.
      
      In order to improve the performance of taprio, we will be making use of the
      txtime feature provided by the ETF qdisc. For all the packets which do not
      have the SO_TXTIME option set, taprio will set the transmit timestamp (set
      in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit
      time for the packet is set to when the gate is open. If SO_TXTIME is set,
      the TAPrio qdisc will validate whether the timestamp (in skb->tstamp)
      occurs when the gate corresponding to skb's traffic class is open.
      
      Following two parameters added to support this mode:
      - flags: used to enable txtime-assist mode. Will also be used to enable
        other modes (like hardware offloading) later.
      - txtime-delay: This indicates the minimum time it will take for the packet
        to hit the wire. This is useful in determining whether we can transmit
      the packet in the remaining time if the gate corresponding to the packet is
      currently open.
      
      An example configuration for enabling txtime-assist:
      
      tc qdisc replace dev eth0 parent root handle 100 taprio \\
            num_tc 3 \\
            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
            queues 1@0 1@0 1@0 \\
            base-time 1558653424279842568 \\
            sched-entry S 01 300000 \\
            sched-entry S 02 300000 \\
            sched-entry S 04 400000 \\
            flags 0x1 \\
            txtime-delay 40000 \\
            clockid CLOCK_TAI
      
      tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\
            offload delta 200000 clockid CLOCK_TAI
      
      Note that all the traffic classes are mapped to the same queue.  This is
      only possible in taprio when txtime-assist is enabled. Also, note that the
      ETF Qdisc is enabled with offload mode set.
      
      In this mode, if the packet's traffic class is open and the complete packet
      can be transmitted, taprio will try to transmit the packet immediately.
      This will be done by setting skb->tstamp to current_time + the time delta
      indicated in the txtime-delay parameter. This parameter indicates the time
      taken (in software) for packet to reach the network adapter.
      
      If the packet cannot be transmitted in the current interval or if the
      packet's traffic is not currently transmitting, the skb->tstamp is set to
      the next available timestamp value. This is tracked in the next_launchtime
      parameter in the struct sched_entry.
      
      The behaviour w.r.t admin and oper schedules is not changed from what is
      present in software mode.
      
      The transmit time is already known in advance. So, we do not need the HR
      timers to advance the schedule and wakeup the dequeue side of taprio.  So,
      HR timer won't be run when this mode is enabled.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cfd5779
    • V
      taprio: Remove inline directive · 566af331
      Vedang Patel 提交于
      Remove inline directive from length_to_duration(). We will let the compiler
      make the decisions.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      566af331
    • V
      taprio: calculate cycle_time when schedule is installed · 037be037
      Vedang Patel 提交于
      cycle time for a particular schedule is calculated only when it is first
      installed. So, it makes sense to just calculate it once right after the
      'cycle_time' parameter has been parsed and store it in cycle_time.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      037be037
    • V
      etf: Add skip_sock_check · d14d2b20
      Vedang Patel 提交于
      Currently, etf expects a socket with SO_TXTIME option set for each packet
      it encounters. So, it will drop all other packets. But, in the future
      commits we are planning to add functionality where tstamp value will be set
      by another qdisc. Also, some packets which are generated from within the
      kernel (e.g. ICMP packets) do not have any socket associated with them.
      
      So, this commit adds support for skip_sock_check. When this option is set,
      etf will skip checking for a socket and other associated options for all
      skbs.
      Signed-off-by: NVedang Patel <vedang.patel@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d14d2b20
    • J
      net: sched: protect against stack overflow in TC act_mirred · e2ca070f
      John Hurley 提交于
      TC hooks allow the application of filters and actions to packets at both
      ingress and egress of the network stack. It is possible, with poor
      configuration, that this can produce loops whereby an ingress hook calls
      a mirred egress action that has an egress hook that redirects back to
      the first ingress etc. The TC core classifier protects against loops when
      doing reclassifies but there is no protection against a packet looping
      between multiple hooks and recursively calling act_mirred. This can lead
      to stack overflow panics.
      
      Add a per CPU counter to act_mirred that is incremented for each recursive
      call of the action function when processing a packet. If a limit is passed
      then the packet is dropped and CPU counter reset.
      
      Note that this patch does not protect against loops in TC datapaths. Its
      aim is to prevent stack overflow kernel panics that can be a consequence
      of such loops.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2ca070f
    • J
      net: sched: refactor reinsert action · 720f22fe
      John Hurley 提交于
      The TC_ACT_REINSERT return type was added as an in-kernel only option to
      allow a packet ingress or egress redirect. This is used to avoid
      unnecessary skb clones in situations where they are not required. If a TC
      hook returns this code then the packet is 'reinserted' and no skb consume
      is carried out as no clone took place.
      
      This return type is only used in act_mirred. Rather than have the reinsert
      called from the main datapath, call it directly in act_mirred. Instead of
      returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
      which tells the caller that the packet has been stolen by another process
      and that no consume call is required.
      
      Moving all redirect calls to the act_mirred code is in preparation for
      tracking recursion created by act_mirred.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      720f22fe
  17. 24 6月, 2019 1 次提交