1. 09 4月, 2021 2 次提交
  2. 03 4月, 2021 1 次提交
  3. 31 3月, 2021 2 次提交
    • Y
      sch_htb: fix null pointer dereference on a null new_q · ae81feb7
      Yunjian Wang 提交于
      sch_htb: fix null pointer dereference on a null new_q
      
      Currently if new_q is null, the null new_q pointer will be
      dereference when 'q->offload' is true. Fix this by adding
      a braces around htb_parent_to_leaf_offload() to avoid it.
      
      Addresses-Coverity: ("Dereference after null check")
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: NYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae81feb7
    • K
      net: sched: bump refcount for new action in ACT replace mode · 6855e821
      Kumar Kartikeya Dwivedi 提交于
      Currently, action creation using ACT API in replace mode is buggy.
      When invoking for non-existent action index 42,
      
      	tc action replace action bpf obj foo.o sec <xyz> index 42
      
      kernel creates the action, fills up the netlink response, and then just
      deletes the action after notifying userspace.
      
      	tc action show action bpf
      
      doesn't list the action.
      
      This happens due to the following sequence when ovr = 1 (replace mode)
      is enabled:
      
      tcf_idr_check_alloc is used to atomically check and either obtain
      reference for existing action at index, or reserve the index slot using
      a dummy entry (ERR_PTR(-EBUSY)).
      
      This is necessary as pointers to these actions will be held after
      dropping the idrinfo lock, so bumping the reference count is necessary
      as we need to insert the actions, and notify userspace by dumping their
      attributes. Finally, we drop the reference we took using the
      tcf_action_put_many call in tcf_action_add. However, for the case where
      a new action is created due to free index, its refcount remains one.
      This when paired with the put_many call leads to the kernel setting up
      the action, notifying userspace of its creation, and then tearing it
      down. For existing actions, the refcount is still held so they remain
      unaffected.
      
      Fortunately due to rtnl_lock serialization requirement, such an action
      with refcount == 1 will not be concurrently deleted by anything else, at
      best CLS API can move its refcount up and down by binding to it after it
      has been published from tcf_idr_insert_many. Since refcount is atleast
      one until put_many call, CLS API cannot delete it. Also __tcf_action_put
      release path already ensures deterministic outcome (either new action
      will be created or existing action will be reused in case CLS API tries
      to bind to action concurrently) due to idr lock serialization.
      
      We fix this by making refcount of newly created actions as 2 in ACT API
      replace mode. A relaxed store will suffice as visibility is ensured only
      after the tcf_idr_insert_many call.
      
      Note that in case of creation or overwriting using CLS API only (i.e.
      bind = 1), overwriting existing action object is not allowed, and any
      such request is silently ignored (without error).
      
      The refcount bump that occurs in tcf_idr_check_alloc call there for
      existing action will pair with tcf_exts_destroy call made from the
      owner module for the same action. In case of action creation, there
      is no existing action, so no tcf_exts_destroy callback happens.
      
      This means no code changes for CLS API.
      
      Fixes: cae422f3 ("net: sched: use reference counting action init")
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6855e821
  4. 24 3月, 2021 1 次提交
    • M
      net/sched: act_ct: clear post_ct if doing ct_clear · 8ca1b090
      Marcelo Ricardo Leitner 提交于
      Invalid detection works with two distinct moments: act_ct tries to find
      a conntrack entry and set post_ct true, indicating that that was
      attempted. Then, when flow dissector tries to dissect CT info and no
      entry is there, it knows that it was tried and no entry was found, and
      synthesizes/sets
                        key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                                        TCA_FLOWER_KEY_CT_FLAGS_INVALID;
      mimicing what OVS does.
      
      OVS has this a bit more streamlined, as it recomputes the key after
      trying to find a conntrack entry for it.
      
      Issue here is, when we have 'tc action ct clear', it didn't clear
      post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to
      the above. The fix, thus, is to clear it.
      
      Reproducer rules:
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \
      	protocol ip flower ip_proto tcp ct_state -trk \
      	action ct zone 1 pipe \
      	action goto chain 2
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \
      	protocol ip flower \
      	action ct clear pipe \
      	action goto chain 4
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \
      	protocol ip flower ct_state -trk \
      	action mirred egress redirect dev enp130s0f1np1_0
      
      With the fix, the 3rd rule matches, like it does with OVS kernel
      datapath.
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ca1b090
  5. 18 3月, 2021 1 次提交
    • W
      net/sched: cls_flower: fix only mask bit check in the validate_ct_state · afa536d8
      wenxu 提交于
      The ct_state validate should not only check the mask bit and also
      check mask_bit & key_bit..
      For the +new+est case example, The 'new' and 'est' bits should be
      set in both state_mask and state flags. Or the -new-est case also
      will be reject by kernel.
      When Openvswitch with two flows
      ct_state=+trk+new,action=commit,forward
      ct_state=+trk+est,action=forward
      
      A packet go through the kernel  and the contrack state is invalid,
      The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the
      finally dp action will be drop with -new-est+trk.
      
      Fixes: 1bcc51ac ("net/sched: cls_flower: Reject invalid ct_state flags rules")
      Fixes: 3aed8b63 ("net/sched: cls_flower: validate ct_state for invalid and reply flags")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afa536d8
  6. 17 3月, 2021 1 次提交
  7. 12 3月, 2021 2 次提交
    • M
      sch_htb: Fix offload cleanup in htb_destroy on htb_init failure · fb3a3e37
      Maxim Mikityanskiy 提交于
      htb_init may fail to do the offload if it's not supported or if a
      runtime error happens when allocating direct qdiscs. In those cases
      TC_HTB_CREATE command is not sent to the driver, however, htb_destroy
      gets called anyway and attempts to send TC_HTB_DESTROY.
      
      It shouldn't happen, because the driver didn't receive TC_HTB_CREATE,
      and also because the driver may not support ndo_setup_tc at all, while
      q->offload is true, and htb_destroy mistakenly thinks the offload is
      supported. Trying to call ndo_setup_tc in the latter case will lead to a
      NULL pointer dereference.
      
      This commit fixes the issues with htb_destroy by deferring assignment of
      q->offload until after the TC_HTB_CREATE command. The necessary cleanup
      of the offload entities is already done in htb_init.
      
      Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb3a3e37
    • M
      sch_htb: Fix select_queue for non-offload mode · 93bde210
      Maxim Mikityanskiy 提交于
      htb_select_queue assumes it's always the offload mode, and it ends up in
      calling ndo_setup_tc without any checks. It may lead to a NULL pointer
      dereference if ndo_setup_tc is not implemented, or to an error returned
      from the driver, which will prevent attaching qdiscs to HTB classes in
      the non-offload mode.
      
      This commit fixes the bug by adding the missing check to
      htb_select_queue. In the non-offload mode it will return sch->dev_queue,
      mimicking tc_modify_qdisc's behavior for the case where select_queue is
      not implemented.
      
      Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93bde210
  8. 11 3月, 2021 1 次提交
    • E
      net: sched: validate stab values · e323d865
      Eric Dumazet 提交于
      iproute2 package is well behaved, but malicious user space can
      provide illegal shift values and trigger UBSAN reports.
      
      Add stab parameter to red_check_params() to validate user input.
      
      syzbot reported:
      
      UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
      shift exponent 111 is too large for 64-bit type 'long unsigned int'
      CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
       red_calc_qavg_from_idle_time include/net/red.h:312 [inline]
       red_calc_qavg include/net/red.h:353 [inline]
       choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221
       __dev_xmit_skb net/core/dev.c:3837 [inline]
       __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150
       neigh_hh_output include/net/neighbour.h:499 [inline]
       neigh_output include/net/neighbour.h:508 [inline]
       ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117
       __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
       __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161
       ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192
       NF_HOOK_COND include/linux/netfilter.h:290 [inline]
       ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215
       dst_output include/net/dst.h:448 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       NF_HOOK include/linux/netfilter.h:295 [inline]
       ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320
       inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135
       dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138
       dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535
       dccp_finish_passive_close net/dccp/proto.c:123 [inline]
       dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118
       dccp_terminate_connection net/dccp/proto.c:958 [inline]
       dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
       __sock_release+0xcd/0x280 net/socket.c:599
       sock_close+0x18/0x20 net/socket.c:1258
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e323d865
  9. 05 3月, 2021 1 次提交
  10. 24 2月, 2021 1 次提交
  11. 17 2月, 2021 1 次提交
    • V
      net: sched: fix police ext initialization · 396d7f23
      Vlad Buslov 提交于
      When police action is created by cls API tcf_exts_validate() first
      conditional that calls tcf_action_init_1() directly, the action idr is not
      updated according to latest changes in action API that require caller to
      commit newly created action to idr with tcf_idr_insert_many(). This results
      such action not being accessible through act API and causes crash reported
      by syzbot:
      
      ==================================================================
      BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
      BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
      BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
      BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
      Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204
      
      CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       __kasan_report mm/kasan/report.c:400 [inline]
       kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      ==================================================================
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G    B             5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       panic+0x306/0x73d kernel/panic.c:231
       end_report+0x58/0x5e mm/kasan/report.c:100
       __kasan_report mm/kasan/report.c:403 [inline]
       kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      Kernel Offset: disabled
      
      Fix the issue by calling tcf_idr_insert_many() after successful action
      initialization.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      396d7f23
  12. 11 2月, 2021 1 次提交
  13. 07 2月, 2021 1 次提交
  14. 30 1月, 2021 2 次提交
  15. 23 1月, 2021 3 次提交
    • M
      sch_htb: Stats for offloaded HTB · 83271586
      Maxim Mikityanskiy 提交于
      This commit adds support for statistics of offloaded HTB. Bytes and
      packets counters for leaf and inner nodes are supported, the values are
      taken from per-queue qdiscs, and the numbers that the user sees should
      have the same behavior as the software (non-offloaded) HTB.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      83271586
    • M
      sch_htb: Hierarchical QoS hardware offload · d03b195b
      Maxim Mikityanskiy 提交于
      HTB doesn't scale well because of contention on a single lock, and it
      also consumes CPU. This patch adds support for offloading HTB to
      hardware that supports hierarchical rate limiting.
      
      In the offload mode, HTB passes control commands to the driver using
      ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
      and their settings (rate, ceil) in the NIC. Every modification of the
      HTB tree caused by the admin results in ndo_setup_tc being called.
      
      After this setup, the HTB algorithm is done completely in the NIC. An SQ
      (send queue) is created for every leaf class and attached to the
      hierarchy, so that the NIC can calculate and obey aggregated rate
      limits, too. In the future, it can be changed, so that multiple SQs will
      back a single leaf class.
      
      ndo_select_queue is responsible for selecting the right queue that
      serves the traffic class of each packet.
      
      The data path works as follows: a packet is classified by clsact, the
      driver selects a hardware queue according to its class, and the packet
      is enqueued into this queue's qdisc.
      
      This solution addresses two main problems of scaling HTB:
      
      1. Contention by flow classification. Currently the filters are attached
      to the HTB instance as follows:
      
          # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
          classid 1:10
      
      It's possible to move classification to clsact egress hook, which is
      thread-safe and lock-free:
      
          # tc filter add dev eth0 egress protocol ip flower dst_port 80
          action skbedit priority 1:10
      
      This way classification still happens in software, but the lock
      contention is eliminated, and it happens before selecting the TX queue,
      allowing the driver to translate the class to the corresponding hardware
      queue in ndo_select_queue.
      
      Note that this is already compatible with non-offloaded HTB and doesn't
      require changes to the kernel nor iproute2.
      
      2. Contention by handling packets. HTB is not multi-queue, it attaches
      to a whole net device, and handling of all packets takes the same lock.
      When HTB is offloaded, it registers itself as a multi-queue qdisc,
      similarly to mq: HTB is attached to the netdev, and each queue has its
      own qdisc.
      
      Some features of HTB may be not supported by some particular hardware,
      for example, the maximum number of classes may be limited, the
      granularity of rate and ceil parameters may be different, etc. - so, the
      offload is not enabled by default, a new parameter is used to enable it:
      
          # tc qdisc replace dev eth0 root handle 1: htb offload
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d03b195b
    • M
      net: sched: Add extack to Qdisc_class_ops.delete · 4dd78a73
      Maxim Mikityanskiy 提交于
      In a following commit, sch_htb will start using extack in the delete
      class operation to pass hardware errors in offload mode. This commit
      prepares for that by adding the extack parameter to this callback and
      converting usage of the existing qdiscs.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4dd78a73
  16. 21 1月, 2021 1 次提交
  17. 20 1月, 2021 1 次提交
  18. 19 1月, 2021 1 次提交
    • C
      net_sched: fix RTNL deadlock again caused by request_module() · d349f997
      Cong Wang 提交于
      tcf_action_init_1() loads tc action modules automatically with
      request_module() after parsing the tc action names, and it drops RTNL
      lock and re-holds it before and after request_module(). This causes a
      lot of troubles, as discovered by syzbot, because we can be in the
      middle of batch initializations when we create an array of tc actions.
      
      One of the problem is deadlock:
      
      CPU 0					CPU 1
      rtnl_lock();
      for (...) {
        tcf_action_init_1();
          -> rtnl_unlock();
          -> request_module();
      				rtnl_lock();
      				for (...) {
      				  tcf_action_init_1();
      				    -> tcf_idr_check_alloc();
      				   // Insert one action into idr,
      				   // but it is not committed until
      				   // tcf_idr_insert_many(), then drop
      				   // the RTNL lock in the _next_
      				   // iteration
      				   -> rtnl_unlock();
          -> rtnl_lock();
          -> a_o->init();
            -> tcf_idr_check_alloc();
            // Now waiting for the same index
            // to be committed
      				    -> request_module();
      				    -> rtnl_lock()
      				    // Now waiting for RTNL lock
      				}
      				rtnl_unlock();
      }
      rtnl_unlock();
      
      This is not easy to solve, we can move the request_module() before
      this loop and pre-load all the modules we need for this netlink
      message and then do the rest initializations. So the loop breaks down
      to two now:
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      struct tc_action_ops *a_o;
      
                      a_o = tc_action_load_ops(name, tb[i]...);
                      ops[i - 1] = a_o;
              }
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      act = tcf_action_init_1(ops[i - 1]...);
              }
      
      Although this looks serious, it only has been reported by syzbot, so it
      seems hard to trigger this by humans. And given the size of this patch,
      I'd suggest to make it to net-next and not to backport to stable.
      
      This patch has been tested by syzbot and tested with tdc.py by me.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
      Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d349f997
  19. 16 1月, 2021 3 次提交
    • E
      net_sched: avoid shift-out-of-bounds in tcindex_set_parms() · bcd0cf19
      Eric Dumazet 提交于
      tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
      attribute is not silly.
      
      UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
      shift exponent 255 is too large for 32-bit type 'int'
      CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       valid_perfect_hash net/sched/cls_tcindex.c:260 [inline]
       tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425
       tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546
       tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127
       rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2390
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bcd0cf19
    • E
      net_sched: reject silly cell_log in qdisc_get_rtab() · e4bedf48
      Eric Dumazet 提交于
      iproute2 probably never goes beyond 8 for the cell exponent,
      but stick to the max shift exponent for signed 32bit.
      
      UBSAN reported:
      UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
      shift exponent 130 is too large for 32-bit type 'int'
      CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x183/0x22e lib/dump_stack.c:120
       ubsan_epilogue lib/ubsan.c:148 [inline]
       __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
       __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
       qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
       cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
       qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
       tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
       rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg net/socket.c:672 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
       ___sys_sendmsg net/socket.c:2399 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2432
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e4bedf48
    • C
      cls_flower: call nla_ok() before nla_next() · c96adff9
      Cong Wang 提交于
      fl_set_enc_opt() simply checks if there are still bytes left to parse,
      but this is not sufficent as syzbot seems to be able to generate
      malformatted netlink messages. nla_ok() is more strict so should be
      used to validate the next nlattr here.
      
      And nla_validate_nested_deprecated() has less strict check too, it is
      probably too late to switch to the strict version, but we can just
      call nla_ok() too after it.
      
      Reported-and-tested-by: syzbot+2624e3778b18fc497c92@syzkaller.appspotmail.com
      Fixes: 0a6e7778 ("net/sched: allow flower to match tunnel options")
      Fixes: 79b1011c ("net: sched: allow flower to match erspan options")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20210115185024.72298-1-xiyou.wangcong@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c96adff9
  20. 29 12月, 2020 1 次提交
    • R
      net: sched: prevent invalid Scell_log shift count · bd1248f1
      Randy Dunlap 提交于
      Check Scell_log shift size in red_check_params() and modify all callers
      of red_check_params() to pass Scell_log.
      
      This prevents a shift out-of-bounds as detected by UBSAN:
        UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
        shift exponent 72 is too large for 32-bit type 'int'
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: netdev@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1248f1
  21. 19 12月, 2020 1 次提交
  22. 18 12月, 2020 1 次提交
  23. 10 12月, 2020 2 次提交
  24. 09 12月, 2020 2 次提交
  25. 05 12月, 2020 1 次提交
    • D
      net/sched: fq_pie: initialize timer earlier in fq_pie_init() · 4eef8b1f
      Davide Caratti 提交于
      with the following tdc testcase:
      
       83be: (qdisc, fq_pie) Create FQ-PIE with invalid number of flows
      
      as fq_pie_init() fails, fq_pie_destroy() is called to clean up. Since the
      timer is not yet initialized, it's possible to observe a splat like this:
      
        INFO: trying to register non-static key.
        the code is fine but needs lockdep annotation.
        turning off the locking correctness validator.
        CPU: 0 PID: 975 Comm: tc Not tainted 5.10.0-rc4+ #298
        Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
        Call Trace:
         dump_stack+0x99/0xcb
         register_lock_class+0x12dd/0x1750
         __lock_acquire+0xfe/0x3970
         lock_acquire+0x1c8/0x7f0
         del_timer_sync+0x49/0xd0
         fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
         qdisc_create+0x916/0x1160
         tc_modify_qdisc+0x3c4/0x1630
         rtnetlink_rcv_msg+0x346/0x8e0
         netlink_unicast+0x439/0x630
         netlink_sendmsg+0x719/0xbf0
         sock_sendmsg+0xe2/0x110
         ____sys_sendmsg+0x5ba/0x890
         ___sys_sendmsg+0xe9/0x160
         __sys_sendmsg+0xd3/0x170
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [...]
        ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
        WARNING: CPU: 0 PID: 975 at lib/debugobjects.c:508 debug_print_object+0x162/0x210
        [...]
        Call Trace:
         debug_object_assert_init+0x268/0x380
         try_to_del_timer_sync+0x6a/0x100
         del_timer_sync+0x9e/0xd0
         fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
         qdisc_create+0x916/0x1160
         tc_modify_qdisc+0x3c4/0x1630
         rtnetlink_rcv_msg+0x346/0x8e0
         netlink_rcv_skb+0x120/0x380
         netlink_unicast+0x439/0x630
         netlink_sendmsg+0x719/0xbf0
         sock_sendmsg+0xe2/0x110
         ____sys_sendmsg+0x5ba/0x890
         ___sys_sendmsg+0xe9/0x160
         __sys_sendmsg+0xd3/0x170
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      fix it moving timer_setup() before any failure, like it was done on 'red'
      with former commit 608b4ada ("net_sched: initialize timer earlier in
      red_init()").
      
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      4eef8b1f
  26. 04 12月, 2020 1 次提交
  27. 02 12月, 2020 1 次提交
  28. 29 11月, 2020 1 次提交
  29. 28 11月, 2020 2 次提交