1. 21 6月, 2021 5 次提交
  2. 18 6月, 2021 1 次提交
  3. 17 6月, 2021 4 次提交
    • E
      net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock · a494bd64
      Eric Dumazet 提交于
      While unix_may_send(sk, osk) is called while osk is locked, it appears
      unix_release_sock() can overwrite unix_peer() after this lock has been
      released, making KCSAN unhappy.
      
      Changing unix_release_sock() to access/change unix_peer()
      before lock is released should fix this issue.
      
      BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock
      
      write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1:
       unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558
       unix_release+0x2f/0x50 net/unix/af_unix.c:859
       __sock_release net/socket.c:599 [inline]
       sock_close+0x6c/0x150 net/socket.c:1258
       __fput+0x25b/0x4e0 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0xae/0x130 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302
       do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0:
       unix_may_send net/unix/af_unix.c:189 [inline]
       unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
       __do_sys_sendmmsg net/socket.c:2519 [inline]
       __se_sys_sendmmsg net/socket.c:2516 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffff888167905400 -> 0x0000000000000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a494bd64
    • E
      net/packet: annotate accesses to po->ifindex · e032f7c9
      Eric Dumazet 提交于
      Like prior patch, we need to annotate lockless accesses to po->ifindex
      For instance, packet_getname() is reading po->ifindex (twice) while
      another thread is able to change po->ifindex.
      
      KCSAN reported:
      
      BUG: KCSAN: data-race in packet_do_bind / packet_getname
      
      write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1:
       packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0:
       packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525
       __sys_getsockname+0x10e/0x1a0 net/socket.c:1887
       __do_sys_getsockname net/socket.c:1902 [inline]
       __se_sys_getsockname net/socket.c:1899 [inline]
       __x64_sys_getsockname+0x3e/0x50 net/socket.c:1899
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e032f7c9
    • E
      net/packet: annotate accesses to po->bind · c7d2ef5d
      Eric Dumazet 提交于
      tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show()
      can read po->num without holding a lock. This means other threads
      can change po->num at the same time.
      
      KCSAN complained about this known fact [1]
      Add READ_ONCE()/WRITE_ONCE() to address the issue.
      
      [1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg
      
      write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0:
       packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1:
       packet_snd net/packet/af_packet.c:2899 [inline]
       packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2433
       __do_sys_sendmsg net/socket.c:2442 [inline]
       __se_sys_sendmsg net/socket.c:2440 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2440
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0x1200
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d2ef5d
    • C
      net: ipv4: fix memory leak in ip_mc_add1_src · d8e29730
      Chengyang Fan 提交于
      BUG: memory leak
      unreferenced object 0xffff888101bc4c00 (size 32):
        comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
          01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................
        backtrace:
          [<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline]
          [<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline]
          [<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline]
          [<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095
          [<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416
          [<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline]
          [<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423
          [<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857
          [<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117
          [<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline]
          [<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline]
          [<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125
          [<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47
          [<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      In commit 24803f38 ("igmp: do not remove igmp souce list info when set
      link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed,
      because it was also called in igmpv3_clear_delrec().
      
      Rough callgraph:
      
      inetdev_destroy
      -> ip_mc_destroy_dev
           -> igmpv3_clear_delrec
              -> ip_mc_clear_src
      -> RCU_INIT_POINTER(dev->ip_ptr, NULL)
      
      However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't
      release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the
      NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through
      inetdev_by_index() and then in_dev->mc_list->sources cannot be released
      by ip_mc_del1_src() in the sock_close. Rough call sequence goes like:
      
      sock_close
      -> __sock_release
         -> inet_release
            -> ip_mc_drop_socket
               -> inetdev_by_index
               -> ip_mc_leave_src
                  -> ip_mc_del_src
                     -> ip_mc_del1_src
      
      So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free
      in_dev->mc_list->sources.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info ...")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NChengyang Fan <cy.fan@huawei.com>
      Acked-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8e29730
  4. 16 6月, 2021 4 次提交
  5. 15 6月, 2021 5 次提交
  6. 13 6月, 2021 1 次提交
    • C
      net: make get_net_ns return error if NET_NS is disabled · ea6932d7
      Changbin Du 提交于
      There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled.
      The reason is that nsfs tries to access ns->ops but the proc_ns_operations
      is not implemented in this case.
      
      [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
      [7.670268] pgd = 32b54000
      [7.670544] [00000010] *pgd=00000000
      [7.671861] Internal error: Oops: 5 [#1] SMP ARM
      [7.672315] Modules linked in:
      [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2 #16
      [7.673309] Hardware name: Generic DT based system
      [7.673642] PC is at nsfs_evict+0x24/0x30
      [7.674486] LR is at clear_inode+0x20/0x9c
      
      The same to tun SIOCGSKNS command.
      
      To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is
      disabled. Meanwhile move it to right place net/core/net_namespace.c.
      Signed-off-by: NChangbin Du <changbin.du@gmail.com>
      Fixes: c62cce2c ("net: add an ioctl to get a socket network namespace")
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea6932d7
  7. 11 6月, 2021 12 次提交
    • P
      mptcp: fix soft lookup in subflow_error_report() · 499ada50
      Paolo Abeni 提交于
      Maxim reported a soft lookup in subflow_error_report():
      
       watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
       RIP: 0010:native_queued_spin_lock_slowpath
       RSP: 0018:ffffa859c0003bc0 EFLAGS: 00000202
       RAX: 0000000000000101 RBX: 0000000000000001 RCX: 0000000000000000
       RDX: ffff9195c2772d88 RSI: 0000000000000000 RDI: ffff9195c2772d88
       RBP: ffff9195c2772d00 R08: 00000000000067b0 R09: c6e31da9eb1e44f4
       R10: ffff9195ef379700 R11: ffff9195edb50710 R12: ffff9195c2772d88
       R13: ffff9195f500e3d0 R14: ffff9195ef379700 R15: ffff9195ef379700
       FS:  0000000000000000(0000) GS:ffff91961f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000c000407000 CR3: 0000000002988000 CR4: 00000000000006f0
       Call Trace:
        <IRQ>
       _raw_spin_lock_bh
       subflow_error_report
       mptcp_subflow_data_available
       __mptcp_move_skbs_from_subflow
       mptcp_data_ready
       tcp_data_queue
       tcp_rcv_established
       tcp_v4_do_rcv
       tcp_v4_rcv
       ip_protocol_deliver_rcu
       ip_local_deliver_finish
       __netif_receive_skb_one_core
       netif_receive_skb
       rtl8139_poll 8139too
       __napi_poll
       net_rx_action
       __do_softirq
       __irq_exit_rcu
       common_interrupt
        </IRQ>
      
      The calling function - mptcp_subflow_data_available() - can be invoked
      from different contexts:
      - plain ssk socket lock
      - ssk socket lock + mptcp_data_lock
      - ssk socket lock + mptcp_data_lock + msk socket lock.
      
      Since subflow_error_report() tries to acquire the mptcp_data_lock, the
      latter two call chains will cause soft lookup.
      
      This change addresses the issue moving the error reporting call to
      outer functions, where the held locks list is known and the we can
      acquire only the needed one.
      Reported-by: NMaxim Galaganov <max@internet.ru>
      Fixes: 15cc1045 ("mptcp: deliver ssk errors to msk")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/199Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499ada50
    • P
      mptcp: do not warn on bad input from the network · 61e71022
      Paolo Abeni 提交于
      warn_bad_map() produces a kernel WARN on bad input coming
      from the network. Use pr_debug() to avoid spamming the system
      log.
      
      Additionally, when the right bound check fails, warn_bad_map() reports
      the wrong ssn value, let's fix it.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/107Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61e71022
    • P
      mptcp: wake-up readers only for in sequence data · 99d1055c
      Paolo Abeni 提交于
      Currently we rely on the subflow->data_avail field, which is subject to
      races:
      
      	ssk1
      		skb len = 500 DSS(seq=1, len=1000, off=0)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk2
      		skb len = 500 DSS(seq = 501, len=1000)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk1
      		skb len = 500 DSS(seq = 1, len=1000, off =500)
      		# still data_avail == MPTCP_SUBFLOW_DATA_AVAIL,
      		# as the skb is covered by a pre-existing map,
      		# which was in-sequence at reception time.
      
      Instead we can explicitly check if some has been received in-sequence,
      propagating the info from __mptcp_move_skbs_from_subflow().
      
      Additionally add the 'ONCE' annotation to the 'data_avail' memory
      access, as msk will read it outside the subflow socket lock.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99d1055c
    • P
      mptcp: try harder to borrow memory from subflow under pressure · 72f96132
      Paolo Abeni 提交于
      If the host is under sever memory pressure, and RX forward
      memory allocation for the msk fails, we try to borrow the
      required memory from the ingress subflow.
      
      The current attempt is a bit flaky: if skb->truesize is less
      than SK_MEM_QUANTUM, the ssk will not release any memory, and
      the next schedule will fail again.
      
      Instead, directly move the required amount of pages from the
      ssk to the msk, if available
      
      Fixes: 9c3f94e1 ("mptcp: add missing memory scheduling in the rx path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f96132
    • M
      sch_cake: Fix out of bounds when parsing TCP options and header · ba91c49d
      Maxim Mikityanskiy 提交于
      The TCP option parser in cake qdisc (cake_get_tcpopt and
      cake_tcph_may_drop) could read one byte out of bounds. When the length
      is 1, the execution flow gets into the loop, reads one byte of the
      opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads
      one more byte, which exceeds the length of 1.
      
      This fix is inspired by commit 9609dad2 ("ipv4: tcp_input: fix stack
      out of bounds when parsing TCP options.").
      
      v2 changes:
      
      Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP
      header. Although it wasn't strictly an out-of-bounds access (memory was
      allocated), garbage values could be read where CAKE expected the TCP
      header if doff was smaller than 5.
      
      Cc: Young Xiao <92siuyang@gmail.com>
      Fixes: 8b713881 ("sch_cake: Add optional ACK filter")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba91c49d
    • M
      mptcp: Fix out of bounds when parsing TCP options · 07718be2
      Maxim Mikityanskiy 提交于
      The TCP option parser in mptcp (mptcp_get_options) could read one byte
      out of bounds. When the length is 1, the execution flow gets into the
      loop, reads one byte of the opcode, and if the opcode is neither
      TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the
      length of 1.
      
      This fix is inspired by commit 9609dad2 ("ipv4: tcp_input: fix stack
      out of bounds when parsing TCP options.").
      
      Cc: Young Xiao <92siuyang@gmail.com>
      Fixes: cec37a6e ("mptcp: Handle MP_CAPABLE options for outgoing connections")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07718be2
    • M
      netfilter: synproxy: Fix out of bounds when parsing TCP options · 5fc177ab
      Maxim Mikityanskiy 提交于
      The TCP option parser in synproxy (synproxy_parse_options) could read
      one byte out of bounds. When the length is 1, the execution flow gets
      into the loop, reads one byte of the opcode, and if the opcode is
      neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds
      the length of 1.
      
      This fix is inspired by commit 9609dad2 ("ipv4: tcp_input: fix stack
      out of bounds when parsing TCP options.").
      
      v2 changes:
      
      Added an early return when length < 0 to avoid calling
      skb_header_pointer with negative length.
      
      Cc: Young Xiao <92siuyang@gmail.com>
      Fixes: 48b1de4c ("netfilter: add SYNPROXY core/target")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc177ab
    • E
      net/packet: annotate data race in packet_sendmsg() · d1b5bee4
      Eric Dumazet 提交于
      There is a known race in packet_sendmsg(), addressed
      in commit 32d3182c ("net/packet: fix race in tpacket_snd()")
      
      Now we have data_race(), we can use it to avoid a future KCSAN warning,
      as syzbot loves stressing af_packet sockets :)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b5bee4
    • N
      net: bridge: fix vlan tunnel dst refcnt when egressing · cfc579f9
      Nikolay Aleksandrov 提交于
      The egress tunnel code uses dst_clone() and directly sets the result
      which is wrong because the entry might have 0 refcnt or be already deleted,
      causing number of problems. It also triggers the WARN_ON() in dst_hold()[1]
      when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and
      checking if a reference was actually taken before setting the dst.
      
      [1] dmesg WARN_ON log and following refcnt errors
       WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
       Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net
       CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G        W         5.13.0-rc3+ #360
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
       RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
       Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49
       RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0
       RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001
       R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000
       R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401
       FS:  0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0
       Call Trace:
        br_handle_vlan+0xbc/0xca [bridge]
        __br_forward+0x23/0x164 [bridge]
        deliver_clone+0x41/0x48 [bridge]
        br_handle_frame_finish+0x36f/0x3aa [bridge]
        ? skb_dst+0x2e/0x38 [bridge]
        ? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge]
        ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
        br_handle_frame+0x2c3/0x377 [bridge]
        ? __skb_pull+0x33/0x51
        ? vlan_do_receive+0x4f/0x36a
        ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
        __netif_receive_skb_core+0x539/0x7c6
        ? __list_del_entry_valid+0x16e/0x1c2
        __netif_receive_skb_list_core+0x6d/0xd6
        netif_receive_skb_list_internal+0x1d9/0x1fa
        gro_normal_list+0x22/0x3e
        dev_gro_receive+0x55b/0x600
        ? detach_buf_split+0x58/0x140
        napi_gro_receive+0x94/0x12e
        virtnet_poll+0x15d/0x315 [virtio_net]
        __napi_poll+0x2c/0x1c9
        net_rx_action+0xe6/0x1fb
        __do_softirq+0x115/0x2d8
        run_ksoftirqd+0x18/0x20
        smpboot_thread_fn+0x183/0x19c
        ? smpboot_unregister_percpu_thread+0x66/0x66
        kthread+0x10a/0x10f
        ? kthread_mod_delayed_work+0xb6/0xb6
        ret_from_fork+0x22/0x30
       ---[ end trace 49f61b07f775fd2b ]---
       dst_release: dst:00000000c02d677a refcnt:-1
       dst_release underflow
      
      Cc: stable@vger.kernel.org
      Fixes: 11538d03 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfc579f9
    • N
      net: bridge: fix vlan tunnel dst null pointer dereference · 58e20717
      Nikolay Aleksandrov 提交于
      This patch fixes a tunnel_dst null pointer dereference due to lockless
      access in the tunnel egress path. When deleting a vlan tunnel the
      tunnel_dst pointer is set to NULL without waiting a grace period (i.e.
      while it's still usable) and packets egressing are dereferencing it
      without checking. Use READ/WRITE_ONCE to annotate the lockless use of
      tunnel_id, use RCU for accessing tunnel_dst and make sure it is read
      only once and checked in the egress path. The dst is already properly RCU
      protected so we don't need to do anything fancy than to make sure
      tunnel_id and tunnel_dst are read only once and checked in the egress path.
      
      Cc: stable@vger.kernel.org
      Fixes: 11538d03 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58e20717
    • Z
      ping: Check return value of function 'ping_queue_rcv_skb' · 9d44fa3e
      Zheng Yongjun 提交于
      Function 'ping_queue_rcv_skb' not always return success, which will
      also return fail. If not check the wrong return value of it, lead to function
      `ping_rcv` return success.
      Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d44fa3e
    • W
      skbuff: fix incorrect msg_zerocopy copy notifications · 3bdd5ee0
      Willem de Bruijn 提交于
      msg_zerocopy signals if a send operation required copying with a flag
      in serr->ee.ee_code.
      
      This field can be incorrect as of the below commit, as a result of
      both structs uarg and serr pointing into the same skb->cb[].
      
      uarg->zerocopy must be read before skb->cb[] is reinitialized to hold
      serr. Similar to other fields len, hi and lo, use a local variable to
      temporarily hold the value.
      
      This was not a problem before, when the value was passed as a function
      argument.
      
      Fixes: 75518851 ("skbuff: Push status and refcounts into sock_zerocopy_callback")
      Reported-by: NTalal Ahmad <talalahmad@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bdd5ee0
  8. 10 6月, 2021 7 次提交
    • M
      net/sched: act_ct: handle DNAT tuple collision · 13c62f53
      Marcelo Ricardo Leitner 提交于
      This this the counterpart of 8aa7b526 ("openvswitch: handle DNAT
      tuple collision") for act_ct. From that commit changelog:
      
      """
      With multiple DNAT rules it's possible that after destination
      translation the resulting tuples collide.
      
      ...
      
      Netfilter handles this case by allocating a null binding for SNAT at
      egress by default.  Perform the same operation in openvswitch for DNAT
      if no explicit SNAT is requested by the user and allocate a null binding
      for SNAT for packets in the "original" direction.
      """
      
      Fixes: 95219afb ("act_ct: support asymmetric conntrack")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13c62f53
    • I
      rtnetlink: Fix regression in bridge VLAN configuration · d2e381c4
      Ido Schimmel 提交于
      Cited commit started returning errors when notification info is not
      filled by the bridge driver, resulting in the following regression:
      
       # ip link add name br1 type bridge vlan_filtering 1
       # bridge vlan add dev br1 vid 555 self pvid untagged
       RTNETLINK answers: Invalid argument
      
      As long as the bridge driver does not fill notification info for the
      bridge device itself, an empty notification should not be considered as
      an error. This is explained in commit 59ccaaaa ("bridge: dont send
      notification when skb->len == 0 in rtnl_bridge_notify").
      
      Fix by removing the error and add a comment to avoid future bugs.
      
      Fixes: a8db57c1 ("rtnetlink: Fix missing error code in rtnl_bridge_notify()")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2e381c4
    • P
      udp: fix race between close() and udp_abort() · a8b897c7
      Paolo Abeni 提交于
      Kaustubh reported and diagnosed a panic in udp_lib_lookup().
      The root cause is udp_abort() racing with close(). Both
      racing functions acquire the socket lock, but udp{v6}_destroy_sock()
      release it before performing destructive actions.
      
      We can't easily extend the socket lock scope to avoid the race,
      instead use the SOCK_DEAD flag to prevent udp_abort from doing
      any action when the critical race happens.
      Diagnosed-and-tested-by: NKaustubh Pandey <kapandey@codeaurora.org>
      Fixes: 5d77dca8 ("net: diag: support SOCK_DESTROY for UDP sockets")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b897c7
    • E
      inet: annotate data race in inet_send_prepare() and inet_dgram_connect() · dcd01eea
      Eric Dumazet 提交于
      Both functions are known to be racy when reading inet_num
      as we do not want to grab locks for the common case the socket
      has been bound already. The race is resolved in inet_autobind()
      by reading again inet_num under the socket lock.
      
      syzbot reported:
      BUG: KCSAN: data-race in inet_send_prepare / udp_lib_get_port
      
      write to 0xffff88812cba150e of 2 bytes by task 24135 on cpu 0:
       udp_lib_get_port+0x4b2/0xe20 net/ipv4/udp.c:308
       udp_v6_get_port+0x5e/0x70 net/ipv6/udp.c:89
       inet_autobind net/ipv4/af_inet.c:183 [inline]
       inet_send_prepare+0xd0/0x210 net/ipv4/af_inet.c:807
       inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
       __do_sys_sendmmsg net/socket.c:2519 [inline]
       __se_sys_sendmmsg net/socket.c:2516 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88812cba150e of 2 bytes by task 24132 on cpu 1:
       inet_send_prepare+0x21/0x210 net/ipv4/af_inet.c:806
       inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
       __do_sys_sendmmsg net/socket.c:2519 [inline]
       __se_sys_sendmmsg net/socket.c:2516 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0x9db4
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 24132 Comm: syz-executor.2 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcd01eea
    • A
      net: ethtool: clear heap allocations for ethtool function · 80ec82e3
      Austin Kim 提交于
      Several ethtool functions leave heap uncleared (potentially) by
      drivers. This will leave the unused portion of heap unchanged and
      might copy the full contents back to userspace.
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ec82e3
    • F
      netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local · 12f36e9b
      Florian Westphal 提交于
      The ip6tables rpfilter match has an extra check to skip packets with
      "::" source address.
      
      Extend this to ipv6 fib expression.  Else ipv6 duplicate address detection
      packets will fail rpf route check -- lookup returns -ENETUNREACH.
      
      While at it, extend the prerouting check to also cover the ingress hook.
      
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1543
      Fixes: f6d0cbcf ("netfilter: nf_tables: add fib expression")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      12f36e9b
    • P
      netfilter: nf_tables: initialize set before expression setup · ad9f151e
      Pablo Neira Ayuso 提交于
      nft_set_elem_expr_alloc() needs an initialized set if expression sets on
      the NFT_EXPR_GC flag. Move set fields initialization before expression
      setup.
      
      [4512935.019450] ==================================================================
      [4512935.019456] BUG: KASAN: null-ptr-deref in nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
      [4512935.019487] Read of size 8 at addr 0000000000000070 by task nft/23532
      [4512935.019494] CPU: 1 PID: 23532 Comm: nft Not tainted 5.12.0-rc4+ #48
      [...]
      [4512935.019502] Call Trace:
      [4512935.019505]  dump_stack+0x89/0xb4
      [4512935.019512]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
      [4512935.019536]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
      [4512935.019560]  kasan_report.cold.12+0x5f/0xd8
      [4512935.019566]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
      [4512935.019590]  nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
      [4512935.019615]  nf_tables_newset+0xc7f/0x1460 [nf_tables]
      
      Reported-by: syzbot+ce96ca2b1d0b37c6422d@syzkaller.appspotmail.com
      Fixes: 65038428 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ad9f151e
  9. 09 6月, 2021 1 次提交