1. 14 7月, 2020 1 次提交
    • P
      net: sched: Pass qdisc reference in struct flow_block_offload · c40f4e50
      Petr Machata 提交于
      Previously, shared blocks were only relevant for the pseudo-qdiscs ingress
      and clsact. Recently, a qevent facility was introduced, which allows to
      bind blocks to well-defined slots of a qdisc instance. RED in particular
      got two qevents: early_drop and mark. Drivers that wish to offload these
      blocks will be sent the usual notification, and need to know which qdisc it
      is related to.
      
      To that end, extend flow_block_offload with a "sch" pointer, and initialize
      as appropriate. This prompts changes in the indirect block facility, which
      now tracks the scheduler in addition to the netdevice. Update signatures of
      several functions similarly.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c40f4e50
  2. 04 7月, 2020 1 次提交
    • T
      sched: consistently handle layer3 header accesses in the presence of VLANs · d7bf2ebe
      Toke Høiland-Jørgensen 提交于
      There are a couple of places in net/sched/ that check skb->protocol and act
      on the value there. However, in the presence of VLAN tags, the value stored
      in skb->protocol can be inconsistent based on whether VLAN acceleration is
      enabled. The commit quoted in the Fixes tag below fixed the users of
      skb->protocol to use a helper that will always see the VLAN ethertype.
      
      However, most of the callers don't actually handle the VLAN ethertype, but
      expect to find the IP header type in the protocol field. This means that
      things like changing the ECN field, or parsing diffserv values, stops
      working if there's a VLAN tag, or if there are multiple nested VLAN
      tags (QinQ).
      
      To fix this, change the helper to take an argument that indicates whether
      the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
      make sure to skip all of them, so behaviour is consistent even in QinQ
      mode.
      
      To make the helper usable from the ECN code, move it to if_vlan.h instead
      of pkt_sched.h.
      
      v3:
      - Remove empty lines
      - Move vlan variable definitions inside loop in skb_protocol()
      - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
        bpf_skb_ecn_set_ce()
      
      v2:
      - Use eth_type_vlan() helper in skb_protocol()
      - Also fix code that reads skb->protocol directly
      - Change a couple of 'if/else if' statements to switch constructs to avoid
        calling the helper twice
      Reported-by: NIlya Ponetayev <i.ponetaev@ndmsystems.com>
      Fixes: d8b9605d ("net: sched: fix skb->protocol use in case of accelerated vlan path")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7bf2ebe
  3. 30 6月, 2020 2 次提交
    • P
      net:qos: police action offloading parameter 'burst' change to the original value · 5f035af7
      Po Liu 提交于
      Since 'tcfp_burst' with TICK factor, driver side always need to recover
      it to the original value, this patch moves the generic calculation and
      recover to the 'burst' original value before offloading to device driver.
      Signed-off-by: NPo Liu <po.liu@nxp.com>
      Acked-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f035af7
    • P
      net: sched: Introduce helpers for qevent blocks · 3625750f
      Petr Machata 提交于
      Qevents are attach points for TC blocks, where filters can be put that are
      executed when "interesting events" take place in a qdisc. The data to keep
      and the functions to invoke to maintain a qevent will be largely the same
      between qevents. Therefore introduce sched-wide helpers for qevent
      management.
      
      Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc,
      blocks attachment cannot be changed after the qdisc is created. To that
      end, add a helper tcf_qevent_validate_change(), which verifies whether
      block index attribute is not attached, or if it is, whether its value
      matches the current one (i.e. there is no material change).
      
      The function tcf_qevent_handle() should be invoked when qdisc hits the
      "interesting event" corresponding to a block. This function releases root
      lock for the duration of executing the attached filters, to allow packets
      generated through user actions (notably mirred) to be reinserted to the
      same qdisc tree.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3625750f
  4. 25 6月, 2020 2 次提交
  5. 20 6月, 2020 2 次提交
    • W
      net/sched: cls_api: fix nooffloaddevcnt warning dmesg log · 3c005110
      wenxu 提交于
      The block->nooffloaddevcnt should always count for indr block.
      even the indr block offload successful. The representor maybe
      gone away and the ingress qdisc can work in software mode.
      
      block->nooffloaddevcnt warning with following dmesg log:
      
      [  760.667058] #####################################################
      [  760.668186] ## TEST test-ecmp-add-vxlan-encap-disable-sriov.sh ##
      [  760.669179] #####################################################
      [  761.780655] :test: Fedora 30 (Thirty)
      [  761.783794] :test: Linux reg-r-vrt-018-180 5.7.0+
      [  761.822890] :test: NIC ens1f0 FW 16.26.6000 PCI 0000:81:00.0 DEVICE 0x1019 ConnectX-5 Ex
      [  761.860244] mlx5_core 0000:81:00.0 ens1f0: Link up
      [  761.880693] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
      [  762.059732] mlx5_core 0000:81:00.1 ens1f1: Link up
      [  762.234341] :test: unbind vfs of ens1f0
      [  762.257825] :test: Change ens1f0 eswitch (0000:81:00.0) mode to switchdev
      [  762.291363] :test: unbind vfs of ens1f1
      [  762.306914] :test: Change ens1f1 eswitch (0000:81:00.1) mode to switchdev
      [  762.309237] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(2), active vports(3)
      [  763.282598] mlx5_core 0000:81:00.1: E-Switch: Supported tc offload range - chains: 4294967294, prios: 4294967295
      [  763.362825] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.444465] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
      [  763.460088] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.502586] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.552429] ens1f1_0: renamed from eth0
      [  763.569569] mlx5_core 0000:81:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  763.629694] ens1f1_1: renamed from eth1
      [  764.631552] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_0: link becomes ready
      [  764.670841] :test: unbind vfs of ens1f0
      [  764.681966] :test: unbind vfs of ens1f1
      [  764.726762] mlx5_core 0000:81:00.0 ens1f0: Link up
      [  764.766511] mlx5_core 0000:81:00.1 ens1f1: Link up
      [  764.797325] :test: Add multipath vxlan encap rule and disable sriov
      [  764.798544] :test: config multipath route
      [  764.812732] mlx5_core 0000:81:00.0: lag map port 1:2 port 2:2
      [  764.874556] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:2
      [  765.603681] :test: OK
      [  765.659048] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_1: link becomes ready
      [  765.675085] :test: verify rule in hw
      [  765.694237] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
      [  765.711892] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1: link becomes ready
      [  766.979230] :test: OK
      [  768.125419] :test: OK
      [  768.127519] :test: - disable sriov ens1f1
      [  768.131160] pci 0000:81:02.2: Removing from iommu group 75
      [  768.132646] pci 0000:81:02.3: Removing from iommu group 76
      [  769.179749] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  769.455627] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:1
      [  769.703990] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  769.988637] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
      [  769.990022] :test: - disable sriov ens1f0
      [  769.994922] pci 0000:81:00.2: Removing from iommu group 73
      [  769.997048] pci 0000:81:00.3: Removing from iommu group 74
      [  771.035813] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  771.339091] ------------[ cut here ]------------
      [  771.340812] WARNING: CPU: 6 PID: 3448 at net/sched/cls_api.c:749 tcf_block_offload_unbind.isra.0+0x5c/0x60
      [  771.341728] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlxfw act_ct nf_flow_table kvm_intel nf_nat kvm nf_conntrack irqbypass crct10dif_pclmul igb crc32_pclmul nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 crc32c_intel ghash_clmulni_intel ptp ipmi_ssif intel_cstate pps_c
      ore ses intel_uncore mei_me iTCO_wdt joydev ipmi_si iTCO_vendor_support i2c_i801 enclosure mei ioatdma dca lpc_ich wmi ipmi_devintf pcspkr acpi_power_meter ipmi_msghandler acpi_pad ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
      [  771.347818] CPU: 6 PID: 3448 Comm: test-ecmp-add-v Not tainted 5.7.0+ #1146
      [  771.348727] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  771.349646] RIP: 0010:tcf_block_offload_unbind.isra.0+0x5c/0x60
      [  771.350553] Code: 4a fd ff ff 83 f8 a1 74 0e 5b 4c 89 e7 5d 41 5c 41 5d e9 07 93 89 ff 8b 83 a0 00 00 00 8d 50 ff 89 93 a0 00 00 00 85 c0 75 df <0f> 0b eb db 0f 1f 44 00 00 41 57 41 56 41 55 41 89 cd 41 54 49 89
      [  771.352420] RSP: 0018:ffffb33144cd3b00 EFLAGS: 00010246
      [  771.353353] RAX: 0000000000000000 RBX: ffff8b37cf4b2800 RCX: 0000000000000000
      [  771.354294] RDX: 00000000ffffffff RSI: ffff8b3b9aad0000 RDI: ffffffff8d5c6e20
      [  771.355245] RBP: ffff8b37eb546948 R08: ffffffffc0b7a348 R09: ffff8b3b9aad0000
      [  771.356189] R10: 0000000000000001 R11: ffff8b3ba7a0a1c0 R12: ffff8b37cf4b2850
      [  771.357123] R13: ffff8b3b9aad0000 R14: ffff8b37cf4b2820 R15: ffff8b37cf4b2820
      [  771.358039] FS:  00007f8a19b6e740(0000) GS:ffff8b3befa00000(0000) knlGS:0000000000000000
      [  771.358965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  771.359885] CR2: 00007f3afb91c1a0 CR3: 000000045133c004 CR4: 00000000001606e0
      [  771.360825] Call Trace:
      [  771.361764]  __tcf_block_put+0x84/0x150
      [  771.362712]  ingress_destroy+0x1b/0x20 [sch_ingress]
      [  771.363658]  qdisc_destroy+0x3e/0xc0
      [  771.364594]  dev_shutdown+0x7a/0xa5
      [  771.365522]  rollback_registered_many+0x20d/0x530
      [  771.366458]  ? netdev_upper_dev_unlink+0x15d/0x1c0
      [  771.367387]  unregister_netdevice_many.part.0+0xf/0x70
      [  771.368310]  vxlan_netdevice_event+0xa4/0x110 [vxlan]
      [  771.369454]  notifier_call_chain+0x4c/0x70
      [  771.370579]  rollback_registered_many+0x2f5/0x530
      [  771.371719]  rollback_registered+0x56/0x90
      [  771.372843]  unregister_netdevice_queue+0x73/0xb0
      [  771.373982]  unregister_netdev+0x18/0x20
      [  771.375168]  mlx5e_vport_rep_unload+0x56/0xc0 [mlx5_core]
      [  771.376327]  esw_offloads_disable+0x81/0x90 [mlx5_core]
      [  771.377512]  mlx5_eswitch_disable_locked.cold+0xcb/0x1af [mlx5_core]
      [  771.378679]  mlx5_eswitch_disable+0x44/0x60 [mlx5_core]
      [  771.379822]  mlx5_device_disable_sriov+0xad/0xb0 [mlx5_core]
      [  771.380968]  mlx5_core_sriov_configure+0xc1/0xe0 [mlx5_core]
      [  771.382087]  sriov_numvfs_store+0xfc/0x130
      [  771.383195]  kernfs_fop_write+0xce/0x1b0
      [  771.384302]  vfs_write+0xb6/0x1a0
      [  771.385410]  ksys_write+0x5f/0xe0
      [  771.386500]  do_syscall_64+0x5b/0x1d0
      [  771.387569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 0fdcf78d ("net: use flow_indr_dev_setup_offload()")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c005110
    • W
      net: flow_offload: fix flow_indr_dev_unregister path · a1db2178
      wenxu 提交于
      If the representor is removed, then identify the indirect flow_blocks
      that need to be removed by the release callback and the port representor
      structure. To identify the port representor structure, a new
      indr.cb_priv field needs to be introduced. The flow_block also needs to
      be removed from the driver list from the cleanup path.
      
      Fixes: 1fac52da ("net: flow_offload: consolidate indirect flow_block infrastructure")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1db2178
  6. 02 6月, 2020 3 次提交
  7. 16 5月, 2020 2 次提交
  8. 07 5月, 2020 1 次提交
    • P
      net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE · 16f80360
      Pablo Neira Ayuso 提交于
      This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
      that the frontend does not need counters, this hw stats type request
      never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
      the driver to disable the stats, however, if the driver cannot disable
      counters, it bails out.
      
      TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
      except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
      (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
      TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.
      
      Fixes: 319a1d19 ("flow_offload: check for basic action hw stats type")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16f80360
  9. 05 5月, 2020 1 次提交
    • C
      net_sched: fix tcm_parent in tc filter dump · a7df4870
      Cong Wang 提交于
      When we tell kernel to dump filters from root (ffff:ffff),
      those filters on ingress (ffff:0000) are matched, but their
      true parents must be dumped as they are. However, kernel
      dumps just whatever we tell it, that is either ffff:ffff
      or ffff:0000:
      
       $ nl-cls-list --dev=dummy0 --parent=root
       cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
       cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
       $ nl-cls-list --dev=dummy0 --parent=ffff:
       cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
       cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all
      
      This is confusing and misleading, more importantly this is
      a regression since 4.15, so the old behavior must be restored.
      
      And, when tc filters are installed on a tc class, the parent
      should be the classid, rather than the qdisc handle. Commit
      edf6711c ("net: sched: remove classid and q fields from tcf_proto")
      removed the classid we save for filters, we can just restore
      this classid in tcf_block.
      
      Steps to reproduce this:
       ip li set dev dummy0 up
       tc qd add dev dummy0 ingress
       tc filter add dev dummy0 parent ffff: protocol arp basic action pass
       tc filter show dev dummy0 root
      
      Before this patch:
       filter protocol arp pref 49152 basic
       filter protocol arp pref 49152 basic handle 0x1
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      After this patch:
       filter parent ffff: protocol arp pref 49152 basic
       filter parent ffff: protocol arp pref 49152 basic handle 0x1
       	action order 1: gact action pass
       	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      Fixes: a10fa201 ("net: sched: propagate q and parent from caller down to tcf_fill_node")
      Fixes: edf6711c ("net: sched: remove classid and q fields from tcf_proto")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7df4870
  10. 02 5月, 2020 1 次提交
  11. 25 4月, 2020 1 次提交
  12. 08 4月, 2020 1 次提交
  13. 28 3月, 2020 1 次提交
  14. 24 3月, 2020 1 次提交
  15. 20 3月, 2020 1 次提交
  16. 19 3月, 2020 1 次提交
    • P
      net: sched: Fix hw_stats_type setting in pedit loop · 2c4b58dc
      Petr Machata 提交于
      In the commit referenced below, hw_stats_type of an entry is set for every
      entry that corresponds to a pedit action. However, the assignment is only
      done after the entry pointer is bumped, and therefore could overwrite
      memory outside of the entries array.
      
      The reason for this positioning may have been that the current entry's
      hw_stats_type is already set above, before the action-type dispatch.
      However, if there are no more actions, the assignment is wrong. And if
      there are, the next round of the for_each_action loop will make the
      assignment before the action-type dispatch anyway.
      
      Therefore fix this issue by simply reordering the two lines.
      
      Fixes: 74522e7b ("net: sched: set the hw_stats_type in pedit loop")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c4b58dc
  17. 18 3月, 2020 1 次提交
  18. 16 3月, 2020 1 次提交
  19. 13 3月, 2020 1 次提交
  20. 09 3月, 2020 1 次提交
  21. 26 2月, 2020 1 次提交
  22. 20 2月, 2020 4 次提交
  23. 18 2月, 2020 2 次提交
  24. 23 1月, 2020 1 次提交
    • E
      net_sched: use validated TCA_KIND attribute in tc_new_tfilter() · 36d79af7
      Eric Dumazet 提交于
      sysbot found another issue in tc_new_tfilter().
      We probably should use @name which contains the sanitized
      version of TCA_KIND.
      
      BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:608 [inline]
      BUG: KMSAN: uninit-value in string+0x522/0x690 lib/vsprintf.c:689
      CPU: 1 PID: 10753 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       string_nocheck lib/vsprintf.c:608 [inline]
       string+0x522/0x690 lib/vsprintf.c:689
       vsnprintf+0x207d/0x31b0 lib/vsprintf.c:2574
       __request_module+0x2ad/0x11c0 kernel/kmod.c:143
       tcf_proto_lookup_ops+0x241/0x720 net/sched/cls_api.c:139
       tcf_proto_create net/sched/cls_api.c:262 [inline]
       tc_new_tfilter+0x2a4e/0x5010 net/sched/cls_api.c:2058
       rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415
       netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg net/socket.c:659 [inline]
       ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
       ___sys_sendmsg net/socket.c:2384 [inline]
       __sys_sendmsg+0x451/0x5f0 net/socket.c:2417
       __do_sys_sendmsg net/socket.c:2426 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45b349
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f88b3948c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f88b39496d4 RCX: 000000000045b349
      RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000099f R14: 00000000004cb163 R15: 000000000075bfd4
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2774 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
       netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg net/socket.c:659 [inline]
       ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
       ___sys_sendmsg net/socket.c:2384 [inline]
       __sys_sendmsg+0x451/0x5f0 net/socket.c:2417
       __do_sys_sendmsg net/socket.c:2426 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 6f96c3c6 ("net_sched: fix backward compatibility for TCA_KIND")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36d79af7
  25. 31 12月, 2019 1 次提交
    • D
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti 提交于
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b72a08
  26. 08 12月, 2019 1 次提交
    • E
      net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() · 2dd5616e
      Eric Dumazet 提交于
      Use the new tcf_proto_check_kind() helper to make sure user
      provided value is well formed.
      
      BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
      BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
      CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
       __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
       string_nocheck lib/vsprintf.c:606 [inline]
       string+0x4be/0x600 lib/vsprintf.c:668
       vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
       __request_module+0x2b1/0x11c0 kernel/kmod.c:143
       tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
       tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
       tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
       rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
       netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45a649
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
      RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
      RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
      R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
       kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
       kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
       slab_alloc_node mm/slub.c:2773 [inline]
       __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
       netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 6f96c3c6 ("net_sched: fix backward compatibility for TCA_KIND")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dd5616e
  27. 07 12月, 2019 2 次提交
    • J
      net: sched: allow indirect blocks to bind to clsact in TC · 25a443f7
      John Hurley 提交于
      When a device is bound to a clsact qdisc, bind events are triggered to
      registered drivers for both ingress and egress. However, if a driver
      registers to such a device using the indirect block routines then it is
      assumed that it is only interested in ingress offload and so only replays
      ingress bind/unbind messages.
      
      The NFP driver supports the offload of some egress filters when
      registering to a block with qdisc of type clsact. However, on unregister,
      if the block is still active, it will not receive an unbind egress
      notification which can prevent proper cleanup of other registered
      callbacks.
      
      Modify the indirect block callback command in TC to send messages of
      ingress and/or egress bind depending on the qdisc in use. NFP currently
      supports egress offload for TC flower offload so the changes are only
      added to TC.
      
      Fixes: 4d12ba42 ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25a443f7
    • J
      net: core: rename indirect block ingress cb function · dbad3408
      John Hurley 提交于
      With indirect blocks, a driver can register for callbacks from a device
      that is does not 'own', for example, a tunnel device. When registering to
      or unregistering from a new device, a callback is triggered to generate
      a bind/unbind event. This, in turn, allows the driver to receive any
      existing rules or to properly clean up installed rules.
      
      When first added, it was assumed that all indirect block registrations
      would be for ingress offloads. However, the NFP driver can, in some
      instances, support clsact qdisc binds for egress offload.
      
      Change the name of the indirect block callback command in flow_offload to
      remove the 'ingress' identifier from it. While this does not change
      functionality, a follow up patch will implement a more more generic
      callback than just those currently just supporting ingress offload.
      
      Fixes: 4d12ba42 ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbad3408
  28. 06 11月, 2019 1 次提交
    • J
      net: sched: prevent duplicate flower rules from tcf_proto destroy race · 59eb87cb
      John Hurley 提交于
      When a new filter is added to cls_api, the function
      tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
      determine if the tcf_proto is duplicated in the chain's hashtable. It then
      creates a new entry or continues with an existing one. In cls_flower, this
      allows the function fl_ht_insert_unque to determine if a filter is a
      duplicate and reject appropriately, meaning that the duplicate will not be
      passed to drivers via the offload hooks. However, when a tcf_proto is
      destroyed it is removed from its chain before a hardware remove hook is
      hit. This can lead to a race whereby the driver has not received the
      remove message but duplicate flows can be accepted. This, in turn, can
      lead to the offload driver receiving incorrect duplicate flows and out of
      order add/delete messages.
      
      Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
      hash table per block stores each unique chain/protocol/prio being
      destroyed. This entry is only removed when the full destroy (and hardware
      offload) has completed. If a new flow is being added with the same
      identiers as a tc_proto being detroyed, then the add request is replayed
      until the destroy is complete.
      
      Fixes: 8b64678e ("net: sched: refactor tp insert/delete for concurrent execution")
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Reported-by: NLouis Peens <louis.peens@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59eb87cb
  29. 09 10月, 2019 1 次提交