1. 09 10月, 2020 4 次提交
    • X
      xfrm: interface: fix the priorities for ipip and ipv6 tunnels · 7fe94612
      Xin Long 提交于
      As Nicolas noticed in his case, when xfrm_interface module is installed
      the standard IP tunnels will break in receiving packets.
      
      This is caused by the IP tunnel handlers with a higher priority in xfrm
      interface processing incoming packets by xfrm_input(), which would drop
      the packets and return 0 instead when anything wrong happens.
      
      Rather than changing xfrm_input(), this patch is to adjust the priority
      for the IP tunnel handlers in xfrm interface, so that the packets would
      go to xfrmi's later than the others', as the others' would not drop the
      packets when the handlers couldn't process them.
      
      Note that IPCOMP also defines its own IPIP tunnel handler and it calls
      xfrm_input() as well, so we must make its priority lower than xfrmi's,
      which means having xfrmi loaded would still break IPCOMP. We may seek
      another way to fix it in xfrm_input() in the future.
      Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Tested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Fixes: da9bbf05 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler")
      FIxes: d7b360c2 ("xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7fe94612
    • D
      openvswitch: handle DNAT tuple collision · 8aa7b526
      Dumitru Ceara 提交于
      With multiple DNAT rules it's possible that after destination
      translation the resulting tuples collide.
      
      For example, two openvswitch flows:
      nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
      
      Assuming two TCP clients initiating the following connections:
      10.0.0.10:5000->10.0.0.10:10
      10.0.0.10:5000->10.0.0.20:10
      
      Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
      nf_conntrack_confirm() to fail because of tuple collision.
      
      Netfilter handles this case by allocating a null binding for SNAT at
      egress by default.  Perform the same operation in openvswitch for DNAT
      if no explicit SNAT is requested by the user and allocate a null binding
      for SNAT for packets in the "original" direction.
      
      Reported-at: https://bugzilla.redhat.com/1877128Suggested-by: NFlorian Westphal <fw@strlen.de>
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: NDumitru Ceara <dceara@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8aa7b526
    • E
      sctp: fix sctp_auth_init_hmacs() error path · d42ee76e
      Eric Dumazet 提交于
      After freeing ep->auth_hmacs we have to clear the pointer
      or risk use-after-free as reported by syzbot:
      
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
      BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
      BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
      Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874
      
      CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
       sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
       sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
       sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
       sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
       sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
       sk_common_release+0x64/0x390 net/core/sock.c:3254
       sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
       __sock_release+0xcd/0x280 net/socket.c:596
       sock_close+0x18/0x20 net/socket.c:1277
       __fput+0x285/0x920 fs/file_table.c:281
       task_work_run+0xdd/0x190 kernel/task_work.c:141
       exit_task_work include/linux/task_work.h:25 [inline]
       do_exit+0xb7d/0x29f0 kernel/exit.c:806
       do_group_exit+0x125/0x310 kernel/exit.c:903
       __do_sys_exit_group kernel/exit.c:914 [inline]
       __se_sys_exit_group kernel/exit.c:912 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x43f278
      Code: Bad RIP value.
      RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278
      RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
      RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
       kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
       kmalloc include/linux/slab.h:554 [inline]
       kmalloc_array include/linux/slab.h:593 [inline]
       kcalloc include/linux/slab.h:605 [inline]
       sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 6874:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
       __cache_free mm/slab.c:3422 [inline]
       kfree+0x10e/0x2b0 mm/slab.c:3760
       sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
       sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
       sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
       sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
       sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
       sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
       sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
       __sys_setsockopt+0x2db/0x610 net/socket.c:2132
       __do_sys_setsockopt net/socket.c:2143 [inline]
       __se_sys_setsockopt net/socket.c:2140 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1f485649 ("[SCTP]: Implement SCTP-AUTH internals")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d42ee76e
    • H
      bridge: Netlink interface fix. · b6c02ef5
      Henrik Bjoernlund 提交于
      This commit is correcting NETLINK br_fill_ifinfo() to be able to
      handle 'filter_mask' with multiple flags asserted.
      
      Fixes: 36a8e8e2 ("bridge: Extend br_fill_ifinfo to return MPR status")
      Signed-off-by: NHenrik Bjoernlund <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Suggested-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Tested-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b6c02ef5
  2. 08 10月, 2020 1 次提交
  3. 06 10月, 2020 4 次提交
    • E
      tcp: fix receive window update in tcp_add_backlog() · 86bccd03
      Eric Dumazet 提交于
      We got reports from GKE customers flows being reset by netfilter
      conntrack unless nf_conntrack_tcp_be_liberal is set to 1.
      
      Traces seemed to suggest ACK packet being dropped by the
      packet capture, or more likely that ACK were received in the
      wrong order.
      
       wscale=7, SYN and SYNACK not shown here.
      
       This ACK allows the sender to send 1871*128 bytes from seq 51359321 :
       New right edge of the window -> 51359321+1871*128=51598809
      
       09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0
      
       09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
       09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
       09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
       09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
       09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
      
       Receiver now allows to send 606*128=77568 from seq 51521241 :
       New right edge of the window -> 51521241+606*128=51598809
      
       09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0
      
       09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
      
       It seems the sender exceeds RWIN allowance, since 51611353 > 51598809
      
       09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
       09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040
      
       09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0
      
       netfilter conntrack is not happy and sends RST
      
       09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0
       09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0
      
       Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
       New right edge of the window -> 51521241+859*128=51631193
      
      Normally TCP stack handles OOO packets just fine, but it
      turns out tcp_add_backlog() does not. It can update the window
      field of the aggregated packet even if the ACK sequence
      of the last received packet is too old.
      
      Many thanks to Alexandre Ferrieux for independently reporting the issue
      and suggesting a fix.
      
      Fixes: 4f693b55 ("tcp: implement coalescing on backlog queue")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAlexandre Ferrieux <alexandre.ferrieux@orange.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86bccd03
    • P
      mptcp: more DATA FIN fixes · 017512a0
      Paolo Abeni 提交于
      Currently data fin on data packet are not handled properly:
      the 'rcv_data_fin_seq' field is interpreted as the last
      sequence number carrying a valid data, but for data fin
      packet with valid maps we currently store map_seq + map_len,
      that is, the next value.
      
      The 'write_seq' fields carries instead the value subseguent
      to the last valid byte, so in mptcp_write_data_fin() we
      never detect correctly the last DSS map.
      
      Fixes: 7279da61 ("mptcp: Use MPTCP-level flag for sending DATA_FIN")
      Fixes: 1a49b2c2 ("mptcp: Handle incoming 32-bit DATA_FIN values")
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      017512a0
    • M
      net: qrtr: ns: Fix the incorrect usage of rcu_read_lock() · 082bb94f
      Manivannan Sadhasivam 提交于
      The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API
      since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence,
      fix it by excluding the locking for kernel_sendmsg().
      
      While at it, let's also use radix_tree_deref_retry() to confirm the
      validity of the pointer returned by radix_tree_deref_slot() and use
      radix_tree_iter_resume() to resume iterating the tree properly before
      releasing the lock as suggested by Doug.
      
      Fixes: a7809ff9 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks")
      Reported-by: NDouglas Anderson <dianders@chromium.org>
      Reviewed-by: NDouglas Anderson <dianders@chromium.org>
      Tested-by: NDouglas Anderson <dianders@chromium.org>
      Tested-by: NAlex Elder <elder@linaro.org>
      Signed-off-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      082bb94f
    • D
      rxrpc: Fix server keyring leak · 38b1dc47
      David Howells 提交于
      If someone calls setsockopt() twice to set a server key keyring, the first
      keyring is leaked.
      
      Fix it to return an error instead if the server key keyring is already set.
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      38b1dc47
  4. 05 10月, 2020 7 次提交
    • D
      rxrpc: The server keyring isn't network-namespaced · fea99111
      David Howells 提交于
      The keyring containing the server's tokens isn't network-namespaced, so it
      shouldn't be looked up with a network namespace.  It is expected to be
      owned specifically by the server, so namespacing is unnecessary.
      
      Fixes: a58946c1 ("keys: Pass the network namespace into request_key mechanism")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fea99111
    • D
      rxrpc: Fix accept on a connection that need securing · 2d914c1b
      David Howells 提交于
      When a new incoming call arrives at an userspace rxrpc socket on a new
      connection that has a security class set, the code currently pushes it onto
      the accept queue to hold a ref on it for the socket.  This doesn't work,
      however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
      state and discards the ref.  This means that the call runs out of refs too
      early and the kernel oopses.
      
      By contrast, a kernel rxrpc socket manually pre-charges the incoming call
      pool with calls that already have user call IDs assigned, so they are ref'd
      by the call tree on the socket.
      
      Change the mode of operation for userspace rxrpc server sockets to work
      like this too.  Although this is a UAPI change, server sockets aren't
      currently functional.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2d914c1b
    • D
      rxrpc: Fix some missing _bh annotations on locking conn->state_lock · fa1d113a
      David Howells 提交于
      conn->state_lock may be taken in softirq mode, but a previous patch
      replaced an outer lock in the response-packet event handling code, and lost
      the _bh from that when doing so.
      
      Fix this by applying the _bh annotation to the state_lock locking.
      
      Fixes: a1399f8b ("rxrpc: Call channels should have separate call number spaces")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fa1d113a
    • D
      rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() · 9a059cd5
      David Howells 提交于
      If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a
      type it doesn't recognise, it can BUG in a couple of places, which is
      unnecessary as it can easily get back to userspace.
      
      Fix this to print an error message instead.
      
      Fixes: 99455153 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9a059cd5
    • M
      rxrpc: Fix rxkad token xdr encoding · 56305118
      Marc Dionne 提交于
      The session key should be encoded with just the 8 data bytes and
      no length; ENCODE_DATA precedes it with a 4 byte length, which
      confuses some existing tools that try to parse this format.
      
      Add an ENCODE_BYTES macro that does not include a length, and use
      it for the key.  Also adjust the expected length.
      
      Note that commit 774521f3 ("rxrpc: Fix an assertion in
      rxrpc_read()") had fixed a BUG by changing the length rather than
      fixing the encoding.  The original length was correct.
      
      Fixes: 99455153 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      56305118
    • G
      net/core: check length before updating Ethertype in skb_mpls_{push,pop} · 4296adc3
      Guillaume Nault 提交于
      Openvswitch allows to drop a packet's Ethernet header, therefore
      skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
      and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
      doesn't point to an Ethernet header and the new Ethertype is written at
      unexpected locations.
      
      Fix this by verifying that mac_len is big enough to contain an Ethernet
      header.
      
      Fixes: fa4e0f88 ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4296adc3
    • C
      net_sched: check error pointer in tcf_dump_walker() · 580e4273
      Cong Wang 提交于
      Although we take RTNL on dump path, it is possible to
      skip RTNL on insertion path. So the following race condition
      is possible:
      
      rtnl_lock()		// no rtnl lock
      			mutex_lock(&idrinfo->lock);
      			// insert ERR_PTR(-EBUSY)
      			mutex_unlock(&idrinfo->lock);
      tc_dump_action()
      rtnl_unlock()
      
      So we have to skip those temporary -EBUSY entries on dump path
      too.
      
      Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      580e4273
  5. 03 10月, 2020 5 次提交
  6. 02 10月, 2020 1 次提交
  7. 30 9月, 2020 2 次提交
  8. 29 9月, 2020 7 次提交
    • J
      ethtool: mark netlink family as __ro_after_init · 78b70155
      Jakub Kicinski 提交于
      Like all genl families ethtool_genl_family needs to not
      be a straight up constant, because it's modified/initialized
      by genl_register_family(). After init, however, it's only
      passed to genlmsg_put() & co. therefore we can mark it
      as __ro_after_init.
      
      Since genl_family structure contains function pointers
      mark this as a fix.
      
      Fixes: 2b4a8990 ("ethtool: introduce ethtool netlink interface")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78b70155
    • M
      net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks · a7809ff9
      Manivannan Sadhasivam 提交于
      The rcu read locks are needed to avoid potential race condition while
      dereferencing radix tree from multiple threads. The issue was identified
      by syzbot. Below is the crash report:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-syzkaller #0 Not tainted
      -----------------------------
      include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by kworker/u4:1/21:
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline]
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241
       #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
      
      stack backtrace:
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: qrtr_ns_handler qrtr_ns_worker
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1e9/0x30e lib/dump_stack.c:118
       radix_tree_deref_slot include/linux/radix-tree.h:176 [inline]
       ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline]
       qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674
       process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268
       worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414
       kthread+0x353/0x380 kernel/kthread.c:268
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com
      Signed-off-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7809ff9
    • T
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo 提交于
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc70edb
    • T
      net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233
      Taehee Yoo 提交于
      Functions related to nested interface infrastructure such as
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      In the following patch, a new member variable will be added into this
      struct to fix the lockdep issue.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eff74233
    • T
      net: core: add __netdev_upper_dev_unlink() · fe8300fd
      Taehee Yoo 提交于
      The netdev_upper_dev_unlink() has to work differently according to flags.
      This idea is the same with __netdev_upper_dev_link().
      
      In the following patches, new flags will be added.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe8300fd
    • C
      net_sched: remove a redundant goto chain check · 1aad8049
      Cong Wang 提交于
      All TC actions call tcf_action_check_ctrlact() to validate
      goto chain, so this check in tcf_action_init_1() is actually
      redundant. Remove it to save troubles of leaking memory.
      
      Fixes: e49d8c22 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
      Reported-by: NVlad Buslov <vladbu@mellanox.com>
      Suggested-by: NDavide Caratti <dcaratti@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aad8049
    • N
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov 提交于
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2f3729f
  9. 25 9月, 2020 5 次提交
    • H
      xfrm: Use correct address family in xfrm_state_find · e94ee171
      Herbert Xu 提交于
      The struct flowi must never be interpreted by itself as its size
      depends on the address family.  Therefore it must always be grouped
      with its original family value.
      
      In this particular instance, the original family value is lost in
      the function xfrm_state_find.  Therefore we get a bogus read when
      it's coupled with the wrong family which would occur with inter-
      family xfrm states.
      
      This patch fixes it by keeping the original family value.
      
      Note that the same bug could potentially occur in LSM through
      the xfrm_state_pol_flow_match hook.  I checked the current code
      there and it seems to be safe for now as only secid is used which
      is part of struct flowi_common.  But that API should be changed
      so that so that we don't get new bugs in the future.  We could
      do that by replacing fl with just secid or adding a family field.
      
      Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com
      Fixes: 48b8d783 ("[XFRM]: State selection update to use inner...")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e94ee171
    • P
      tcp: skip DSACKs with dubious sequence ranges · ad2b9b0f
      Priyaranjan Jha 提交于
      Currently, we use length of DSACKed range to compute number of
      delivered packets. And if sequence range in DSACK is corrupted,
      we can get bogus dsacked/acked count, and bogus cwnd.
      
      This patch put bounds on DSACKed range to skip update of data
      delivery and spurious retransmission information, if the DSACK
      is unlikely caused by sender's action:
      - DSACKed range shouldn't be greater than maximum advertised rwnd.
      - Total no. of DSACKed segments shouldn't be greater than total
        no. of retransmitted segs. Unlike spurious retransmits, network
        duplicates or corrupted DSACKs shouldn't be counted as delivery.
      Signed-off-by: NPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad2b9b0f
    • R
      net/tls: race causes kernel panic · 38f7e1c0
      Rohit Maheshwari 提交于
      BUG: kernel NULL pointer dereference, address: 00000000000000b8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S
       5.9.0-rc3+ #1
       Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018
       Workqueue: events tx_work_handler [tls]
       RIP: 0010:tx_work_handler+0x1b/0x70 [tls]
       Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b
       6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83
       e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00
       RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122
       RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200
       RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665
       R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700
       R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38
       FS:  0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        process_one_work+0x1a7/0x370
        worker_thread+0x30/0x370
        ? process_one_work+0x370/0x370
        kthread+0x114/0x130
        ? kthread_park+0x80/0x80
        ret_from_fork+0x22/0x30
      
      tls_sw_release_resources_tx() waits for encrypt_pending, which
      can have race, so we need similar changes as in commit
      0cada332 here as well.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38f7e1c0
    • C
      net_sched: commit action insertions together · 0fedc63f
      Cong Wang 提交于
      syzbot is able to trigger a failure case inside the loop in
      tcf_action_init(), and when this happens we clean up with
      tcf_action_destroy(). But, as these actions are already inserted
      into the global IDR, other parallel process could free them
      before tcf_action_destroy(), then we will trigger a use-after-free.
      
      Fix this by deferring the insertions even later, after the loop,
      and committing all the insertions in a separate loop, so we will
      never fail in the middle of the insertions any more.
      
      One side effect is that the window between alloction and final
      insertion becomes larger, now it is more likely that the loop in
      tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
      to check for error pointer in tcf_del_walker().
      
      Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fedc63f
    • C
      net_sched: defer tcf_idr_insert() in tcf_action_init_1() · e49d8c22
      Cong Wang 提交于
      All TC actions call tcf_idr_insert() for new action at the end
      of their ->init(), so we can actually move it to a central place
      in tcf_action_init_1().
      
      And once the action is inserted into the global IDR, other parallel
      process could free it immediately as its refcnt is still 1, so we can
      not fail after this, we need to move it after the goto action
      validation to avoid handling the failure case after insertion.
      
      This is found during code review, is not directly triggered by syzbot.
      And this prepares for the next patch.
      
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e49d8c22
  10. 24 9月, 2020 2 次提交
  11. 22 9月, 2020 2 次提交
    • E
      inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute · d5e4d0a5
      Eric Dumazet 提交于
      User space could send an invalid INET_DIAG_REQ_PROTOCOL attribute
      as caught by syzbot.
      
      BUG: KMSAN: uninit-value in inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
      BUG: KMSAN: uninit-value in __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
      CPU: 0 PID: 8505 Comm: syz-executor174 Not tainted 5.9.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x21c/0x280 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
       inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
       __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
       inet_diag_dump_compat+0x2a5/0x380 net/ipv4/inet_diag.c:1254
       netlink_dump+0xb73/0x1cb0 net/netlink/af_netlink.c:2246
       __netlink_dump_start+0xcf2/0xea0 net/netlink/af_netlink.c:2354
       netlink_dump_start include/linux/netlink.h:246 [inline]
       inet_diag_rcv_msg_compat+0x5da/0x6c0 net/ipv4/inet_diag.c:1288
       sock_diag_rcv_msg+0x24f/0x620 net/core/sock_diag.c:256
       netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470
       sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
       ___sys_sendmsg net/socket.c:2407 [inline]
       __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
       __do_sys_sendmsg net/socket.c:2449 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
       do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x441389
      Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff3b02ce98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441389
      RDX: 0000000000000000 RSI: 0000000020001500 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000402130
      R13: 00000000004021c0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80
       slab_alloc_node mm/slub.c:2907 [inline]
       __kmalloc_node_track_caller+0x9aa/0x12f0 mm/slub.c:4511
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x35f/0xb30 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1094 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
       netlink_sendmsg+0xdb9/0x1840 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
       ___sys_sendmsg net/socket.c:2407 [inline]
       __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
       __do_sys_sendmsg net/socket.c:2449 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
       do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 3f935c75 ("inet_diag: support for wider protocol numbers")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5e4d0a5
    • V
      net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU · 99f62a74
      Vladimir Oltean 提交于
      When calling the RCU brother of br_vlan_get_pvid(), lockdep warns:
      
      =============================
      WARNING: suspicious RCU usage
      5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted
      -----------------------------
      net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage!
      
      Call trace:
       lockdep_rcu_suspicious+0xd4/0xf8
       __br_vlan_get_pvid+0xc0/0x100
       br_vlan_get_pvid_rcu+0x78/0x108
      
      The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group()
      which calls rtnl_dereference() instead of rcu_dereference(). In turn,
      rtnl_dereference() calls rcu_dereference_protected() which assumes
      operation under an RCU write-side critical section, which obviously is
      not the case here. So, when the incorrect primitive is used to access
      the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may
      cause various unexpected problems.
      
      I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot
      share the same implementation. So fix the bug by splitting the 2
      functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups
      under proper locking annotations.
      
      Fixes: 7582f5b7 ("bridge: add br_vlan_get_pvid_rcu()")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99f62a74