1. 07 12月, 2021 1 次提交
    • A
      ethtool: do not perform operations on net devices being unregistered · dde91ccf
      Antoine Tenart 提交于
      There is a short period between a net device starts to be unregistered
      and when it is actually gone. In that time frame ethtool operations
      could still be performed, which might end up in unwanted or undefined
      behaviours[1].
      
      Do not allow ethtool operations after a net device starts its
      unregistration. This patch targets the netlink part as the ioctl one
      isn't affected: the reference to the net device is taken and the
      operation is executed within an rtnl lock section and the net device
      won't be found after unregister.
      
      [1] For example adding Tx queues after unregister ends up in NULL
          pointer exceptions and UaFs, such as:
      
            BUG: KASAN: use-after-free in kobject_get+0x14/0x90
            Read of size 1 at addr ffff88801961248c by task ethtool/755
      
            CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778
            Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014
            Call Trace:
             dump_stack_lvl+0x57/0x72
             print_address_description.constprop.0+0x1f/0x140
             kasan_report.cold+0x7f/0x11b
             kobject_get+0x14/0x90
             kobject_add_internal+0x3d1/0x450
             kobject_init_and_add+0xba/0xf0
             netdev_queue_update_kobjects+0xcf/0x200
             netif_set_real_num_tx_queues+0xb4/0x310
             veth_set_channels+0x1c3/0x550
             ethnl_set_channels+0x524/0x610
      
      Fixes: 041b1c5d ("ethtool: helper functions for netlink interface")
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      dde91ccf
  2. 04 12月, 2021 2 次提交
    • L
      net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero · 2be6d4d1
      Lee Jones 提交于
      Currently, due to the sequential use of min_t() and clamp_t() macros,
      in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic
      sets tx_max to 0.  This is then used to allocate the data area of the
      SKB requested later in cdc_ncm_fill_tx_frame().
      
      This does not cause an issue presently because when memory is
      allocated during initialisation phase of SKB creation, more memory
      (512b) is allocated than is required for the SKB headers alone (320b),
      leaving some space (512b - 320b = 192b) for CDC data (172b).
      
      However, if more elements (for example 3 x u64 = [24b]) were added to
      one of the SKB header structs, say 'struct skb_shared_info',
      increasing its original size (320b [320b aligned]) to something larger
      (344b [384b aligned]), then suddenly the CDC data (172b) no longer
      fits in the spare SKB data area (512b - 384b = 128b).
      
      Consequently the SKB bounds checking semantics fails and panics:
      
        skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172   \
           head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev:<NULL>
      
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:110!
        RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106
        <snip>
        Call Trace:
         <IRQ>
         skb_over_panic+0x2c/0x30 net/core/skbuff.c:115
         skb_put+0x205/0x210 net/core/skbuff.c:1877
         skb_put_zero include/linux/skbuff.h:2270 [inline]
         cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline]
         cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293
         cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514
      
      By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX
      when not offered through the system provided params, we ensure enough
      data space is allocated to handle the CDC data, meaning no crash will
      occur.
      
      Cc: Oliver Neukum <oliver@neukum.org>
      Fixes: 289507d3 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Reviewed-by: NBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/20211202143437.1411410-1-lee.jones@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2be6d4d1
    • M
      qede: validate non LSO skb length · 8e227b19
      Manish Chopra 提交于
      Although it is unlikely that stack could transmit a non LSO
      skb with length > MTU, however in some cases or environment such
      occurrences actually resulted into firmware asserts due to packet
      length being greater than the max supported by the device (~9700B).
      
      This patch adds the safeguard for such odd cases to avoid firmware
      asserts.
      
      v2: Added "Fixes" tag with one of the initial driver commit
          which enabled the TX traffic actually (as this was probably
          day1 issue which was discovered recently by some customer
          environment)
      
      Fixes: a2ec6172 ("qede: Add support for link")
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAlok Prasad <palok@marvell.com>
      Signed-off-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Link: https://lore.kernel.org/r/20211203174413.13090-1-manishc@marvell.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8e227b19
  3. 03 12月, 2021 13 次提交
    • D
      net: altera: set a couple error code in probe() · badd7857
      Dan Carpenter 提交于
      There are two error paths which accidentally return success instead of
      a negative error code.
      
      Fixes: bbd2190c ("Altera TSE: Add main and header file for Altera Ethernet Driver")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      badd7857
    • J
      net: bcm4908: Handle dma_set_coherent_mask error codes · 128f6ec9
      Jiasheng Jiang 提交于
      The return value of dma_set_coherent_mask() is not always 0.
      To catch the exception in case that dma is not support the mask.
      
      Fixes: 9d61d138 ("net: broadcom: rename BCM4908 driver & update DT binding")
      Signed-off-by: NJiasheng Jiang <jiasheng@iscas.ac.cn>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128f6ec9
    • L
      selftests: net/fcnal-test.sh: add exit code · 0f8a3b48
      Li Zhijian 提交于
      Previously, the selftest framework always treats it as *ok* even though
      some of them are failed actually. That's because the script always
      returns 0.
      
      It supports PASS/FAIL/SKIP exit code now.
      
      CC: Philip Li <philip.li@intel.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f8a3b48
    • E
      bonding: make tx_rebalance_counter an atomic · dac8e00f
      Eric Dumazet 提交于
      KCSAN reported a data-race [1] around tx_rebalance_counter
      which can be accessed from different contexts, without
      the protection of a lock/mutex.
      
      [1]
      BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
      
      write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
       bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
       bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
       do_set_master net/core/rtnetlink.c:2521 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
       rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
       bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000001 -> 0x00000064
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bond1 bond_alb_monitor
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dac8e00f
    • E
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet 提交于
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       __sk_rx_queue_set include/net/sock.h:1924 [inline]
       sk_rx_queue_update include/net/sock.h:1938 [inline]
       sk_mark_napi_id include/net/busy_poll.h:136 [inline]
       tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
       smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
       kthread+0x721/0x850 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
       sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
       inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
       tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
       tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
       tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Fixes: a37a0ee4 ("net: avoid uninit-value from tcp_conn_request")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Tested-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03cfda4f
    • E
      inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently · a9418924
      Eric Dumazet 提交于
      Since commit 4e1beecc ("net/sock: Add kernel config
      SOCK_RX_QUEUE_MAPPING"),
      sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING.
      
      Fixes: 54b92e84 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Tariq Toukan <tariqt@nvidia.com>
      Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9418924
    • L
      selftests/tc-testing: Fix cannot create /sys/bus/netdevsim/new_device: Directory nonexistent · db925bca
      Li Zhijian 提交于
      Install netdevsim to provide /sys/bus/netdevsim/new_device interface.
      
      It helps to fix:
       # ok 97 9a7d - Change ETS strict band without quantum # skipped - skipped - previous setup failed 11 ce7d
       #
       #
       # -----> prepare stage *** Could not execute: "echo "1 1 4" > /sys/bus/netdevsim/new_device"
       #
       # -----> prepare stage *** Error message: "/bin/sh: 1: cannot create /sys/bus/netdevsim/new_device: Directory nonexistent
       # "
       #
       # -----> prepare stage *** Aborting test run.
       #
       #
       # <_io.BufferedReader name=5> *** stdout ***
       #
      Signed-off-by: NLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db925bca
    • L
      selftests/tc-testing: add missing config · a8c9505c
      Li Zhijian 提交于
      qdiscs/fq_pie requires CONFIG_NET_SCH_FQ_PIE, otherwise tc will fail
      to create a fq_pie qdisc.
      
      It fixes following issue:
       # not ok 57 83be - Create FQ-PIE with invalid number of flows
       #       Command exited with 2, expected 0
       # Error: Specified qdisc not found.
      Signed-off-by: NLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8c9505c
    • L
      selftests/tc-testing: add exit code · 96f38967
      Li Zhijian 提交于
      Mark the summary result as FAIL to prevent from confusing the selftest
      framework if some of them are failed.
      
      Previously, the selftest framework always treats it as *ok* even though
      some of them are failed actually. That's because the script tdc.sh always
      return 0.
      
       # All test results:
       #
       # 1..97
       # ok 1 83be - Create FQ-PIE with invalid number of flows
       # ok 2 8b6e - Create RED with no flags
      [...snip]
       # ok 6 5f15 - Create RED with flags ECN, harddrop
       # ok 7 53e8 - Create RED with flags ECN, nodrop
       # ok 8 d091 - Fail to create RED with only nodrop flag
       # ok 9 af8e - Create RED with flags ECN, nodrop, harddrop
       # not ok 10 ce7d - Add mq Qdisc to multi-queue device (4 queues)
       #       Could not match regex pattern. Verify command output:
       # qdisc mq 1: root
       # qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
       # qdisc fq_codel 0: parent 1:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
      [...snip]
       # ok 96 6979 - Change quantum of a strict ETS band
       # ok 97 9a7d - Change ETS strict band without quantum
       #
       #
       #
       #
       ok 1 selftests: tc-testing: tdc.sh <<< summary result
      
      CC: Philip Li <philip.li@intel.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NLi Zhijian <zhijianx.li@intel.com>
      Acked-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96f38967
    • P
      selftests/fib_tests: Rework fib_rp_filter_test() · f6071e5e
      Peilin Ye 提交于
      Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
      failing.  ping sockets are bound to dummy1 using the "-I" option
      (SO_BINDTODEVICE), but socket lookup is failing when receiving ping
      replies, since the routing table thinks they belong to dummy0.
      
      For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
      When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
      is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
      raw_sk_bound_dev_eq() check fails.  Similar things happen in
      ping_lookup() for SOCK_DGRAM sockets.
      
      These tests used to pass due to a bug [1] in iputils, where "ping -I"
      actually did not bind ICMP message sockets to device.  The bug has been
      fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
      to the specific device") in 2016, which is why our rp_filter tests
      started to fail.  See [2] .
      
      Fixing the tests while keeping everything in one netns turns out to be
      nontrivial.  Rework the tests and build the following topology:
      
       ┌─────────────────────────────┐    ┌─────────────────────────────┐
       │  network namespace 1 (ns1)  │    │  network namespace 2 (ns2)  │
       │                             │    │                             │
       │  ┌────┐     ┌─────┐         │    │  ┌─────┐            ┌────┐  │
       │  │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │  │
       │  └────┘     ├─────┴──────┐  │    │  ├─────┴──────┐     └────┘  │
       │             │192.0.2.1/24│  │    │  │192.0.2.1/24│             │
       │             └────────────┘  │    │  └────────────┘             │
       └─────────────────────────────┘    └─────────────────────────────┘
      
      Consider sending an ICMP_ECHO packet A in ns2.  Both source and
      destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
      in both ns1 and ns2:
      
        1. A is routed to lo since its destination IP address is one of ns2's
           local addresses (veth2);
        2. A is redirected from lo's egress to veth2's egress using mirred;
        3. A arrives at veth1's ingress in ns1;
        4. A is redirected from veth1's ingress to lo's ingress, again, using
           mirred;
        5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
           since A was received on lo, but reverse path lookup says veth1;
        6. However A is not dropped since we have relaxed this check for lo in
           commit 66f82095 ("fib: relax source validation check for loopback
           packets");
      
      Making sure A is not dropped here in this corner case is the whole point
      of having this test.
      
        7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
           generated;
        8. Similarly, B is redirected from lo's egress to veth1's egress (in
           ns1), then redirected once again from veth2's ingress to lo's
           ingress (in ns2), using mirred.
      
      Also test "ping 127.0.0.1" from ns2.  It does not trigger the relaxed
      check in __fib_validate_source(), but just to make sure the topology
      works with loopback addresses.
      
      Tested with ping from iputils 20210722-41-gf9fb573:
      
      $ ./fib_tests.sh -t rp_filter
      
      IPv4 rp_filter tests
          TEST: rp_filter passes local packets		[ OK ]
          TEST: rp_filter passes loopback packets		[ OK ]
      
      [1] https://github.com/iputils/iputils/issues/55
      [2] https://github.com/iputils/iputils/commit/f455fee41c077d4b700a473b2f5b3487b8febc1dReported-by: NHangbin Liu <liuhangbin@gmail.com>
      Fixes: adb701d6 ("selftests: add a test case for rp_filter")
      Reviewed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211201004720.6357-1-yepeilin.cs@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      f6071e5e
    • L
      Merge tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a51e3ac4
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, and wireguard.
      
        Mostly scattered driver changes this week, with one big clump in
        mv88e6xxx. Nothing of note, really.
      
        Current release - regressions:
      
         - smc: keep smc_close_final()'s error code during active close
      
        Current release - new code bugs:
      
         - iwlwifi: various static checker fixes (int overflow, leaks, missing
           error codes)
      
         - rtw89: fix size of firmware header before transfer, avoid crash
      
         - mt76: fix timestamp check in tx_status; fix pktid leak;
      
         - mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()
      
        Previous releases - regressions:
      
         - smc: fix list corruption in smc_lgr_cleanup_early
      
         - ipv4: convert fib_num_tclassid_users to atomic_t
      
        Previous releases - always broken:
      
         - tls: fix authentication failure in CCM mode
      
         - vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
           incorrect processing
      
         - dsa: mv88e6xxx: fixes for various device errata
      
         - rds: correct socket tunable error in rds_tcp_tune()
      
         - ipv6: fix memory leak in fib6_rule_suppress
      
         - wireguard: reset peer src endpoint when netns exits
      
         - wireguard: improve resilience to DoS around incoming handshakes
      
         - tcp: fix page frag corruption on page fault which involves TCP
      
         - mpls: fix missing attributes in delete notifications
      
         - mt7915: fix NULL pointer dereference with ad-hoc mode
      
        Misc:
      
         - rt2x00: be more lenient about EPROTO errors during start
      
         - mlx4_en: update reported link modes for 1/10G"
      
      * tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
        net: dsa: b53: Add SPI ID table
        gro: Fix inconsistent indenting
        selftests: net: Correct case name
        net/rds: correct socket tunable error in rds_tcp_tune()
        mctp: Don't let RTM_DELROUTE delete local routes
        net/smc: Keep smc_close_final rc during active close
        ibmvnic: drop bad optimization in reuse_tx_pools()
        ibmvnic: drop bad optimization in reuse_rx_pools()
        net/smc: fix wrong list_del in smc_lgr_cleanup_early
        Fix Comment of ETH_P_802_3_MIN
        ethernet: aquantia: Try MAC address from device tree
        ipv4: convert fib_num_tclassid_users to atomic_t
        net: avoid uninit-value from tcp_conn_request
        net: annotate data-races on txq->xmit_lock_owner
        octeontx2-af: Fix a memleak bug in rvu_mbox_init()
        net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
        vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
        net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
        net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed
        net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family
        ...
      a51e3ac4
    • L
      Merge tag 'trace-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 2b2c0f24
      Linus Torvalds 提交于
      Pull tracing fixes from Steven Rostedt:
       "Three tracing fixes:
      
         - Allow compares of strings when using signed and unsigned characters
      
         - Fix kmemleak false positive for histogram entries
      
         - Handle negative numbers for user defined kretprobe data sizes"
      
      * tag 'trace-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        kprobes: Limit max data_size of the kretprobe instances
        tracing: Fix a kmemleak false positive in tracing_map
        tracing/histograms: String compares should not care about signed values
      2b2c0f24
    • L
      Merge tag 'for-linus-5.16-2' of git://github.com/cminyard/linux-ipmi · df365887
      Linus Torvalds 提交于
      Pull IPMI fixes from Corey Minyard:
       "Some changes that went in 5.16 had issues. When working on the design
        a piece was redesigned and things got missed. And the message type was
        not being initialized when it was allocated, resulting in crashes.
      
        In addition, the IPMI driver has had a shutdown issue where it could
        still have an item in a system workqueue after it had been shutdown.
        Move to a private workqueue to avoid that problem"
      
      * tag 'for-linus-5.16-2' of git://github.com/cminyard/linux-ipmi:
        ipmi:ipmb: Fix unknown command response
        ipmi: fix IPMI_SMI_MSG_TYPE_IPMB_DIRECT response length checking
        ipmi: fix oob access due to uninit smi_msg type
        ipmi: msghandler: Make symbol 'remove_work_wq' static
        ipmi: Move remove_work to dedicated workqueue
      df365887
  4. 02 12月, 2021 22 次提交
    • F
      net: dsa: b53: Add SPI ID table · 88362ebf
      Florian Fainelli 提交于
      Currently autoloading for SPI devices does not use the DT ID table, it
      uses SPI modalises. Supporting OF modalises is going to be difficult if
      not impractical, an attempt was made but has been reverted, so ensure
      that module autoloading works for this driver by adding an id_table
      listing the SPI IDs for everything.
      
      Fixes: 96c8395e ("spi: Revert modalias changes")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88362ebf
    • J
      gro: Fix inconsistent indenting · 1ebb87cc
      Jiapeng Chong 提交于
      Eliminate the follow smatch warning:
      
      net/ipv6/ip6_offload.c:249 ipv6_gro_receive() warn: inconsistent
      indenting.
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ebb87cc
    • L
      selftests: net: Correct case name · a05431b2
      Li Zhijian 提交于
      ipv6_addr_bind/ipv4_addr_bind are function names. Previously, bind test
      would not be run by default due to the wrong case names
      
      Fixes: 34d0302a ("selftests: Add ipv6 address bind tests to fcnal-test")
      Fixes: 75b2b2b3 ("selftests: Add ipv4 address bind tests to fcnal-test")
      Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a05431b2
    • W
      net/rds: correct socket tunable error in rds_tcp_tune() · 19f36edf
      William Kucharski 提交于
      Correct an error where setting /proc/sys/net/rds/tcp/rds_tcp_rcvbuf would
      instead modify the socket's sk_sndbuf and would leave sk_rcvbuf untouched.
      
      Fixes: c6a58ffe ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket")
      Signed-off-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19f36edf
    • M
      mctp: Don't let RTM_DELROUTE delete local routes · 76d00160
      Matt Johnston 提交于
      We need to test against the existing route type, not
      the rtm_type in the netlink request.
      
      Fixes: 83f0a0b7 ("mctp: Specify route types, require rtm_type in RTM_*ROUTE messages")
      Signed-off-by: NMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76d00160
    • T
      net/smc: Keep smc_close_final rc during active close · 00e158fb
      Tony Lu 提交于
      When smc_close_final() returns error, the return code overwrites by
      kernel_sock_shutdown() in smc_close_active(). The return code of
      smc_close_final() is more important than kernel_sock_shutdown(), and it
      will pass to userspace directly.
      
      Fix it by keeping both return codes, if smc_close_final() raises an
      error, return it or kernel_sock_shutdown()'s.
      
      Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
      Fixes: 606a63c9 ("net/smc: Ensure the active closing peer first closes clcsock")
      Suggested-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: NWen Gu <guwen@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00e158fb
    • S
      ibmvnic: drop bad optimization in reuse_tx_pools() · 5b085601
      Sukadev Bhattiprolu 提交于
      When trying to decide whether or not reuse existing rx/tx pools
      we tried to allow a range of values for the pool parameters rather
      than exact matches. This was intended to reuse the resources for
      instance when switching between two VIO servers with different
      default parameters.
      
      But this optimization is incomplete and breaks when we try to
      change the number of queues for instance. The optimization needs
      to be updated, so drop it for now and simplify the code.
      
      Fixes: bbd80930 ("ibmvnic: Reuse tx pools when possible")
      Reported-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Reviewed-by: NRick Lindsley <ricklind@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b085601
    • S
      ibmvnic: drop bad optimization in reuse_rx_pools() · 0584f494
      Sukadev Bhattiprolu 提交于
      When trying to decide whether or not reuse existing rx/tx pools
      we tried to allow a range of values for the pool parameters rather
      than exact matches. This was intended to reuse the resources for
      instance when switching between two VIO servers with different
      default parameters.
      
      But this optimization is incomplete and breaks when we try to
      change the number of queues for instance. The optimization needs
      to be updated, so drop it for now and simplify the code.
      
      Fixes: 489de956 ("ibmvnic: Reuse rx pools when possible")
      Reported-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Reviewed-by: NRick Lindsley <ricklind@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0584f494
    • D
      net/smc: fix wrong list_del in smc_lgr_cleanup_early · 789b6cc2
      Dust Li 提交于
      smc_lgr_cleanup_early() meant to delete the link
      group from the link group list, but it deleted
      the list head by mistake.
      
      This may cause memory corruption since we didn't
      remove the real link group from the list and later
      memseted the link group structure.
      We got a list corruption panic when testing:
      
      [  231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000
      [  231.278222] ------------[ cut here ]------------
      [  231.278726] kernel BUG at lib/list_debug.c:53!
      [  231.279326] invalid opcode: 0000 [#1] SMP NOPTI
      [  231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435
      [  231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
      [  231.281248] Workqueue: events smc_link_down_work
      [  231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90
      [  231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c
      60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f>
      0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc
      [  231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292
      [  231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000
      [  231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040
      [  231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001
      [  231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001
      [  231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003
      [  231.288337] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      [  231.289160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0
      [  231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  231.291940] Call Trace:
      [  231.292211]  smc_lgr_terminate_sched+0x53/0xa0
      [  231.292677]  smc_switch_conns+0x75/0x6b0
      [  231.293085]  ? update_load_avg+0x1a6/0x590
      [  231.293517]  ? ttwu_do_wakeup+0x17/0x150
      [  231.293907]  ? update_load_avg+0x1a6/0x590
      [  231.294317]  ? newidle_balance+0xca/0x3d0
      [  231.294716]  smcr_link_down+0x50/0x1a0
      [  231.295090]  ? __wake_up_common_lock+0x77/0x90
      [  231.295534]  smc_link_down_work+0x46/0x60
      [  231.295933]  process_one_work+0x18b/0x350
      
      Fixes: a0a62ee1 ("net/smc: separate locks for SMCD and SMCR link group lists")
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      789b6cc2
    • X
      Fix Comment of ETH_P_802_3_MIN · 72f6a452
      Xiayu Zhang 提交于
      The description of ETH_P_802_3_MIN is misleading.
      The value of EthernetType in Ethernet II frame is more than 0x0600,
      the value of Length in 802.3 frame is less than 0x0600.
      Signed-off-by: NXiayu Zhang <Xiayu.Zhang@mediatek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f6a452
    • T
      ethernet: aquantia: Try MAC address from device tree · 553217c2
      Tianhao Chai 提交于
      Apple M1 Mac minis (2020) with 10GE NICs do not have MAC address in the
      card, but instead need to obtain MAC addresses from the device tree. In
      this case the hardware will report an invalid MAC.
      
      Currently atlantic driver does not query the DT for MAC address and will
      randomly assign a MAC if the NIC doesn't have a permanent MAC burnt in.
      This patch causes the driver to perfer a valid MAC address from OF (if
      present) over HW self-reported MAC and only fall back to a random MAC
      address when neither of them is valid.
      Signed-off-by: NTianhao Chai <cth451@gmail.com>
      Reviewed-by: NIgor Russkikh <irusskikh@marvell.com>
      Reviewed-by: NHector Martin <marcan@marcan.st>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      553217c2
    • E
      ipv4: convert fib_num_tclassid_users to atomic_t · 213f5f8f
      Eric Dumazet 提交于
      Before commit faa041a4 ("ipv4: Create cleanup helper for fib_nh")
      changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
      
      After the change, this is no longer the case, as free_fib_info_rcu()
      runs after rcu grace period, without rtnl being held.
      
      Fixes: faa041a4 ("ipv4: Create cleanup helper for fib_nh")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      213f5f8f
    • E
      net: avoid uninit-value from tcp_conn_request · a37a0ee4
      Eric Dumazet 提交于
      A recent change triggers a KMSAN warning, because request
      sockets do not initialize @sk_rx_queue_mapping field.
      
      Add sk_rx_queue_update() helper to make our intent clear.
      
      BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline]
      BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       sk_rx_queue_set include/net/sock.h:1922 [inline]
       tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       invoke_softirq+0xa4/0x130 kernel/softirq.c:432
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x76/0x130 kernel/softirq.c:648
       common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40
       smap_restore arch/x86/include/asm/smap.h:67 [inline]
       get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline]
       __msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63
       tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173
       tomoyo_path_permission security/tomoyo/file.c:586 [inline]
       tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777
       tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311
       security_file_open+0xb1/0x1f0 security/security.c:1635
       do_dentry_open+0x4e4/0x1bf0 fs/open.c:809
       vfs_open+0xaf/0xe0 fs/open.c:957
       do_open fs/namei.c:3426 [inline]
       path_openat+0x52f1/0x5dd0 fs/namei.c:3559
       do_filp_open+0x306/0x760 fs/namei.c:3586
       do_sys_openat2+0x263/0x8f0 fs/open.c:1212
       do_sys_open fs/open.c:1228 [inline]
       __do_sys_open fs/open.c:1236 [inline]
       __se_sys_open fs/open.c:1232 [inline]
       __x64_sys_open+0x314/0x380 fs/open.c:1232
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       reqsk_alloc include/net/request_sock.h:91 [inline]
       inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712
       tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a37a0ee4
    • E
      net: annotate data-races on txq->xmit_lock_owner · 7a10d8c8
      Eric Dumazet 提交于
      syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
      without annotations.
      
      No serious issue there, let's document what is happening there.
      
      BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
      
      write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
       __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
       __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_hh_output include/net/neighbour.h:511 [inline]
       neigh_output include/net/neighbour.h:525 [inline]
       ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
       __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
       neigh_output include/net/neighbour.h:527 [inline]
       ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
       folio_test_anon include/linux/page-flags.h:581 [inline]
       PageAnon include/linux/page-flags.h:586 [inline]
       zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
       zap_pmd_range mm/memory.c:1467 [inline]
       zap_pud_range mm/memory.c:1496 [inline]
       zap_p4d_range mm/memory.c:1517 [inline]
       unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
       unmap_single_vma+0x157/0x210 mm/memory.c:1583
       unmap_vmas+0xd0/0x180 mm/memory.c:1615
       exit_mmap+0x23d/0x470 mm/mmap.c:3170
       __mmput+0x27/0x1b0 kernel/fork.c:1113
       mmput+0x3d/0x50 kernel/fork.c:1134
       exit_mm+0xdb/0x170 kernel/exit.c:507
       do_exit+0x608/0x17a0 kernel/exit.c:819
       do_group_exit+0xce/0x180 kernel/exit.c:929
       get_signal+0xfc3/0x1550 kernel/signal.c:2852
       arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0xffffffff
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7a10d8c8
    • Z
      octeontx2-af: Fix a memleak bug in rvu_mbox_init() · e07a097b
      Zhou Qingyang 提交于
      In rvu_mbox_init(), mbox_regions is not freed or passed out
      under the switch-default region, which could lead to a memory leak.
      
      Fix this bug by changing 'return err' to 'goto free_regions'.
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_OCTEONTX2_AF=y show no new warnings,
      and our static analyzer no longer warns about this code.
      
      Fixes: 98c56111 (“octeontx2-af: cn10k: Add mbox support for CN10K platform”)
      Signed-off-by: NZhou Qingyang <zhou1615@umn.edu>
      Link: https://lore.kernel.org/r/20211130165039.192426-1-zhou1615@umn.eduSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e07a097b
    • Z
      net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources() · addad764
      Zhou Qingyang 提交于
      In mlx4_en_try_alloc_resources(), mlx4_en_copy_priv() is called and
      tmp->tx_cq will be freed on the error path of mlx4_en_copy_priv().
      After that mlx4_en_alloc_resources() is called and there is a dereference
      of &tmp->tx_cq[t][i] in mlx4_en_alloc_resources(), which could lead to
      a use after free problem on failure of mlx4_en_copy_priv().
      
      Fix this bug by adding a check of mlx4_en_copy_priv()
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_MLX4_EN=m show no new warnings,
      and our static analyzer no longer warns about this code.
      
      Fixes: ec25bc04 ("net/mlx4_en: Add resilience in low memory systems")
      Signed-off-by: NZhou Qingyang <zhou1615@umn.edu>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20211130164438.190591-1-zhou1615@umn.eduSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      addad764
    • S
      vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit · ee201011
      Stephen Suryaputra 提交于
      IPCB/IP6CB need to be initialized when processing outbound v4 or v6 pkts
      in the codepath of vrf device xmit function so that leftover garbage
      doesn't cause futher code that uses the CB to incorrectly process the
      pkt.
      
      One occasion of the issue might occur when MPLS route uses the vrf
      device as the outgoing device such as when the route is added using "ip
      -f mpls route add <label> dev <vrf>" command.
      
      The problems seems to exist since day one. Hence I put the day one
      commits on the Fixes tags.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Cc: stable@vger.kernel.org
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211130162637.3249-1-ssuryaextr@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ee201011
    • Z
      net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() · e2dabc4f
      Zhou Qingyang 提交于
      In qlcnic_83xx_add_rings(), the indirect function of
      ahw->hw_ops->alloc_mbx_args will be called to allocate memory for
      cmd.req.arg, and there is a dereference of it in qlcnic_83xx_add_rings(),
      which could lead to a NULL pointer dereference on failure of the
      indirect function like qlcnic_83xx_alloc_mbx_args().
      
      Fix this bug by adding a check of alloc_mbx_args(), this patch
      imitates the logic of mbx_cmd()'s failure handling.
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_QLCNIC=m show no new warnings, and our
      static analyzer no longer warns about this code.
      
      Fixes: 7f966452 ("qlcnic: 83xx memory map and HW access routine")
      Signed-off-by: NZhou Qingyang <zhou1615@umn.edu>
      Link: https://lore.kernel.org/r/20211130110848.109026-1-zhou1615@umn.eduSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e2dabc4f
    • M
      kprobes: Limit max data_size of the kretprobe instances · 6bbfa441
      Masami Hiramatsu 提交于
      The 'kprobe::data_size' is unsigned, thus it can not be negative.  But if
      user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
      + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
      kretprobe_instance) or zero. In result, the kretprobe_instance are
      allocated without enough memory, and kretprobe accesses outside of
      allocated memory.
      
      To avoid this issue, introduce a max limitation of the
      kretprobe::data_size. 4KB per instance should be OK.
      
      Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2
      
      Cc: stable@vger.kernel.org
      Fixes: f47cd9b5 ("kprobes: kretprobe user entry-handler")
      Reported-by: Nzhangyue <zhangyue1@kylinos.cn>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6bbfa441
    • C
      tracing: Fix a kmemleak false positive in tracing_map · f25667e5
      Chen Jun 提交于
      Doing the command:
        echo 'hist:key=common_pid.execname,common_timestamp' > /sys/kernel/debug/tracing/events/xxx/trigger
      
      Triggers many kmemleak reports:
      
      unreferenced object 0xffff0000c7ea4980 (size 128):
        comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
          [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
          [<00000000633bd154>] tracing_map_init+0x1f8/0x268
          [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
          [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
          [<00000000f549355a>] event_trigger_write+0x7c/0x120
          [<00000000b80f898d>] vfs_write+0xc4/0x380
          [<00000000823e1055>] ksys_write+0x74/0xf8
          [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
          [<0000000087124017>] do_el0_svc+0x88/0x1c0
          [<00000000efd0dcd1>] el0_svc+0x1c/0x28
          [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
          [<00000000e7399680>] el0_sync+0x148/0x180
      unreferenced object 0xffff0000c7ea4980 (size 128):
        comm "bash", pid 338, jiffies 4294912626 (age 9339.324s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0
          [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178
          [<00000000633bd154>] tracing_map_init+0x1f8/0x268
          [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0
          [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128
          [<00000000f549355a>] event_trigger_write+0x7c/0x120
          [<00000000b80f898d>] vfs_write+0xc4/0x380
          [<00000000823e1055>] ksys_write+0x74/0xf8
          [<000000008a9374aa>] __arm64_sys_write+0x24/0x30
          [<0000000087124017>] do_el0_svc+0x88/0x1c0
          [<00000000efd0dcd1>] el0_svc+0x1c/0x28
          [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0
          [<00000000e7399680>] el0_sync+0x148/0x180
      
      The reason is elts->pages[i] is alloced by get_zeroed_page.
      and kmemleak will not scan the area alloced by get_zeroed_page.
      The address stored in elts->pages will be regarded as leaked.
      
      That is, the elts->pages[i] will have pointers loaded onto it as well, and
      without telling kmemleak about it, those pointers will look like memory
      without a reference.
      
      To fix this, call kmemleak_alloc to tell kmemleak to scan elts->pages[i]
      
      Link: https://lkml.kernel.org/r/20211124140801.87121-1-chenjun102@huawei.comSigned-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      f25667e5
    • S
      tracing/histograms: String compares should not care about signed values · 450fec13
      Steven Rostedt (VMware) 提交于
      When comparing two strings for the "onmatch" histogram trigger, fields
      that are strings use string comparisons, which do not care about being
      signed or not.
      
      Do not fail to match two string fields if one is unsigned char array and
      the other is a signed char array.
      
      Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/
      
      Cc: stable@vgerk.kernel.org
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Fixes: b05e89ae ("tracing: Accept different type for synthetic event fields")
      Reviewed-by: NMasami Hiramatsu <mhiramatsu@kernel.org>
      Reported-by: NSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      450fec13
    • L
      Merge tag 'sound-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 4536579b
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes. A large series is found for ASoC tegra
        drivers to correct the control element handlings, while others are
        mostly for device-specific quirks and fix-ups"
      
      * tag 'sound-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
        ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P
        ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid
        ALSA: hda/cs8409: Set PMSG_ON earlier inside cs8409 driver
        ASoC: SOF: hda: reset DAI widget before reconfiguring it
        ASoC: cs35l41: Set the max SPI speed for the whole device
        ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
        ASoC: Intel: soc-acpi: add entry for ESSX8336 on CML
        ASoC: rk817: Add module alias for rk817-codec
        ASoC: soc-acpi: Set mach->id field on comp_ids matches
        ASoC: tegra: Fix kcontrol put callback in Mixer
        ASoC: tegra: Fix kcontrol put callback in ADX
        ASoC: tegra: Fix kcontrol put callback in AMX
        ASoC: tegra: Fix kcontrol put callback in SFC
        ASoC: tegra: Fix kcontrol put callback in MVC
        ASoC: tegra: Fix kcontrol put callback in AHUB
        ASoC: tegra: Fix kcontrol put callback in DSPK
        ASoC: tegra: Fix kcontrol put callback in DMIC
        ASoC: tegra: Fix kcontrol put callback in I2S
        ASoC: tegra: Fix kcontrol put callback in ADMAIF
        ASoC: tegra: Fix wrong value type in MVC
        ...
      4536579b
  5. 01 12月, 2021 2 次提交