1. 10 11月, 2019 10 次提交
    • E
      net: use skb_queue_empty_lockless() in busy poll contexts · 4f3df7f1
      Eric Dumazet 提交于
      [ Upstream commit 3f926af3f4d688e2e11e7f8ed04e277a14d4d4a4 ]
      
      Busy polling usually runs without locks.
      Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
      
      Also uses READ_ONCE() in __skb_try_recv_datagram() to address
      a similar potential problem.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f3df7f1
    • E
      net: use skb_queue_empty_lockless() in poll() handlers · eaf548fe
      Eric Dumazet 提交于
      [ Upstream commit 3ef7cf57c72f32f61e97f8fa401bc39ea1f1a5d4 ]
      
      Many poll() handlers are lockless. Using skb_queue_empty_lockless()
      instead of skb_queue_empty() is more appropriate.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eaf548fe
    • E
      udp: use skb_queue_empty_lockless() · afa1f5e9
      Eric Dumazet 提交于
      [ Upstream commit 137a0dbe3426fd7bcfe3f8117b36a87b3590e4eb ]
      
      syzbot reported a data-race [1].
      
      We should use skb_queue_empty_lockless() to document that we are
      not ensuring a mutual exclusion and silence KCSAN.
      
      [1]
      BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb
      
      write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
       skb_queue_empty include/linux/skbuff.h:1494 [inline]
       __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afa1f5e9
    • E
      udp: fix data-race in udp_set_dev_scratch() · a8a5adbb
      Eric Dumazet 提交于
      [ Upstream commit a793183caa9afae907a0d7ddd2ffd57329369bf5 ]
      
      KCSAN reported a data-race in udp_set_dev_scratch() [1]
      
      The issue here is that we must not write over skb fields
      if skb is shared. A similar issue has been fixed in commit
      89c22d8c ("net: Fix skb csum races when peeking")
      
      While we are at it, use a helper only dealing with
      udp_skb_scratch(skb)->csum_unnecessary, as this allows
      udp_set_dev_scratch() to be called once and thus inlined.
      
      [1]
      BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
      
      write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
       udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
       __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
       first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
       udp_poll+0xea/0x110 net/ipv4/udp.c:2720
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       do_select+0x7d0/0x1020 fs/select.c:534
       core_sys_select+0x381/0x550 fs/select.c:677
       do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
       __do_sys_pselect6 fs/select.c:784 [inline]
       __se_sys_pselect6 fs/select.c:769 [inline]
       __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
       udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
       udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a5adbb
    • Z
      net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol() · 321c9915
      zhanglin 提交于
      [ Upstream commit 5ff223e86f5addbfae26419cbb5d61d98f6fbf7d ]
      
      memset() the structure ethtool_wolinfo that has padded bytes
      but the padded bytes have not been zeroed out.
      Signed-off-by: Nzhanglin <zhang.lin16@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      321c9915
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · 40400fdd
      Guillaume Nault 提交于
      [ Upstream commit d4e4fdf9e4a27c87edb79b1478955075be141f67 ]
      
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40400fdd
    • E
      net: annotate accesses to sk->sk_incoming_cpu · 0cfaf03c
      Eric Dumazet 提交于
      [ Upstream commit 7170a977743b72cf3eb46ef6ef89885dc7ad3621 ]
      
      This socket field can be read and written by concurrent cpus.
      
      Use READ_ONCE() and WRITE_ONCE() annotations to document this,
      and avoid some compiler 'optimizations'.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
      
      write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
       sk_incoming_cpu_update include/net/sock.h:953 [inline]
       tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
       do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
       do_softirq kernel/softirq.c:329 [inline]
       __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
      
      read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
       sk_incoming_cpu_update include/net/sock.h:952 [inline]
       tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cfaf03c
    • E
      inet: stop leaking jiffies on the wire · 07de7389
      Eric Dumazet 提交于
      [ Upstream commit a904a0693c189691eeee64f6c6b188bd7dc244e9 ]
      
      Historically linux tried to stick to RFC 791, 1122, 2003
      for IPv4 ID field generation.
      
      RFC 6864 made clear that no matter how hard we try,
      we can not ensure unicity of IP ID within maximum
      lifetime for all datagrams with a given source
      address/destination address/protocol tuple.
      
      Linux uses a per socket inet generator (inet_id), initialized
      at connection startup with a XOR of 'jiffies' and other
      fields that appear clear on the wire.
      
      Thiemo Nagel pointed that this strategy is a privacy
      concern as this provides 16 bits of entropy to fingerprint
      devices.
      
      Let's switch to a random starting point, this is just as
      good as far as RFC 6864 is concerned and does not leak
      anything critical.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07de7389
    • X
      erspan: fix the tun_info options_len check for erspan · 163901dc
      Xin Long 提交于
      [ Upstream commit 2eb8d6d2910cfe3dc67dc056f26f3dd9c63d47cd ]
      
      The check for !md doens't really work for ip_tunnel_info_opts(info) which
      only does info + 1. Also to avoid out-of-bounds access on info, it should
      ensure options_len is not less than erspan_metadata in both erspan_xmit()
      and ip6erspan_tunnel_xmit().
      
      Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      163901dc
    • E
      dccp: do not leak jiffies on the wire · 96df1ec2
      Eric Dumazet 提交于
      [ Upstream commit 3d1e5039f5f87a8731202ceca08764ee7cb010d3 ]
      
      For some reason I missed the case of DCCP passive
      flows in my previous patch.
      
      Fixes: a904a0693c18 ("inet: stop leaking jiffies on the wire")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96df1ec2
  2. 06 11月, 2019 9 次提交
    • E
      sch_netem: fix rcu splat in netem_enqueue() · a6c91087
      Eric Dumazet 提交于
      commit 159d2c7d8106177bd9a986fd005a311fe0d11285 upstream.
      
      qdisc_root() use from netem_enqueue() triggers a lockdep warning.
      
      __dev_queue_xmit() uses rcu_read_lock_bh() which is
      not equivalent to rcu_read_lock() + local_bh_disable_bh as far
      as lockdep is concerned.
      
      WARNING: suspicious RCU usage
      5.3.0-rc7+ #0 Not tainted
      -----------------------------
      include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by syz-executor427/8855:
       #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline]
       #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214
       #1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline]
       #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838
      
      stack backtrace:
      CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357
       qdisc_root include/net/sch_generic.h:492 [inline]
       netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479
       __dev_xmit_skb net/core/dev.c:3527 [inline]
       __dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838
       dev_queue_xmit+0x18/0x20 net/core/dev.c:3902
       neigh_hh_output include/net/neighbour.h:500 [inline]
       neigh_output include/net/neighbour.h:509 [inline]
       ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290
       ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125
       ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555
       udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887
       udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174
       inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6c91087
    • D
      rxrpc: Fix trace-after-put looking at the put peer record · 8d9c4a9b
      David Howells 提交于
      commit 55f6c98e3674ce16038a1949c3f9ca5a9a99f289 upstream.
      
      rxrpc_put_peer() calls trace_rxrpc_peer() after it has done the decrement
      of the refcount - which looks at the debug_id in the peer record.  But
      unless the refcount was reduced to zero, we no longer have the right to
      look in the record and, indeed, it may be deleted by some other thread.
      
      Fix this by getting the debug_id out before decrementing the refcount and
      then passing that into the tracepoint.
      
      This can cause the following symptoms:
      
          BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
          [inline]
          BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
          net/rxrpc/peer_object.c:435
          Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
      
      Fixes: 1159d4b4 ("rxrpc: Add a tracepoint to track rxrpc_peer refcounting")
      Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d9c4a9b
    • D
      rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record · e8e51ce7
      David Howells 提交于
      commit 9ebeddef58c41bd700419cdcece24cf64ce32276 upstream.
      
      The rxrpc_peer record needs to hold a reference on the rxrpc_local record
      it points as the peer is used as a base to access information in the
      rxrpc_local record.
      
      This can cause problems in __rxrpc_put_peer(), where we need the network
      namespace pointer, and in rxrpc_send_keepalive(), where we need to access
      the UDP socket, leading to symptoms like:
      
          BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411
          [inline]
          BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0
          net/rxrpc/peer_object.c:435
          Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216
      
      Fix this by taking a ref on the local record for the peer record.
      
      Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8e51ce7
    • D
      rxrpc: Fix call ref leak · 570ab0dd
      David Howells 提交于
      commit c48fc11b69e95007109206311b0187a3090591f3 upstream.
      
      When sendmsg() finds a call to continue on with, if the call is in an
      inappropriate state, it doesn't release the ref it just got on that call
      before returning an error.
      
      This causes the following symptom to show up with kasan:
      
      	BUG: KASAN: use-after-free in rxrpc_send_keepalive+0x8a2/0x940
      	net/rxrpc/output.c:635
      	Read of size 8 at addr ffff888064219698 by task kworker/0:3/11077
      
      where line 635 is:
      
      	whdr.epoch	= htonl(peer->local->rxnet->epoch);
      
      The local endpoint (which cannot be pinned by the call) has been released,
      but not the peer (which is pinned by the call).
      
      Fix this by releasing the call in the error path.
      
      Fixes: 37411cad ("rxrpc: Fix potential NULL-pointer exception")
      Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      570ab0dd
    • E
      llc: fix sk_buff leak in llc_conn_service() · d634bd01
      Eric Biggers 提交于
      commit b74555de21acd791f12c4a1aeaf653dd7ac21133 upstream.
      
      syzbot reported:
      
          BUG: memory leak
          unreferenced object 0xffff88811eb3de00 (size 224):
             comm "syz-executor559", pid 7315, jiffies 4294943019 (age 10.300s)
             hex dump (first 32 bytes):
               00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
               00 a0 38 24 81 88 ff ff 00 c0 f2 15 81 88 ff ff  ..8$............
             backtrace:
               [<000000008d1c66a1>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
               [<000000008d1c66a1>] slab_post_alloc_hook mm/slab.h:439 [inline]
               [<000000008d1c66a1>] slab_alloc_node mm/slab.c:3269 [inline]
               [<000000008d1c66a1>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
               [<00000000447d9496>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
               [<000000000cdbf82f>] alloc_skb include/linux/skbuff.h:1058 [inline]
               [<000000000cdbf82f>] llc_alloc_frame+0x66/0x110 net/llc/llc_sap.c:54
               [<000000002418b52e>] llc_conn_ac_send_sabme_cmd_p_set_x+0x2f/0x140  net/llc/llc_c_ac.c:777
               [<000000001372ae17>] llc_exec_conn_trans_actions net/llc/llc_conn.c:475  [inline]
               [<000000001372ae17>] llc_conn_service net/llc/llc_conn.c:400 [inline]
               [<000000001372ae17>] llc_conn_state_process+0x1ac/0x640  net/llc/llc_conn.c:75
               [<00000000f27e53c1>] llc_establish_connection+0x110/0x170  net/llc/llc_if.c:109
               [<00000000291b2ca0>] llc_ui_connect+0x10e/0x370 net/llc/af_llc.c:477
               [<000000000f9c740b>] __sys_connect+0x11d/0x170 net/socket.c:1840
               [...]
      
      The bug is that most callers of llc_conn_send_pdu() assume it consumes a
      reference to the skb, when actually due to commit b85ab56c ("llc:
      properly handle dev_queue_xmit() return value") it doesn't.
      
      Revert most of that commit, and instead make the few places that need
      llc_conn_send_pdu() to *not* consume a reference call skb_get() before.
      
      Fixes: b85ab56c ("llc: properly handle dev_queue_xmit() return value")
      Reported-by: syzbot+6b825a6494a04cc0e3f7@syzkaller.appspotmail.com
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d634bd01
    • E
      llc: fix sk_buff leak in llc_sap_state_process() · 3f3f7409
      Eric Biggers 提交于
      commit c6ee11c39fcc1fb55130748990a8f199e76263b4 upstream.
      
      syzbot reported:
      
          BUG: memory leak
          unreferenced object 0xffff888116270800 (size 224):
             comm "syz-executor641", pid 7047, jiffies 4294947360 (age 13.860s)
             hex dump (first 32 bytes):
               00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
               00 20 e1 2a 81 88 ff ff 00 40 3d 2a 81 88 ff ff  . .*.....@=*....
             backtrace:
               [<000000004d41b4cc>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
               [<000000004d41b4cc>] slab_post_alloc_hook mm/slab.h:439 [inline]
               [<000000004d41b4cc>] slab_alloc_node mm/slab.c:3269 [inline]
               [<000000004d41b4cc>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
               [<00000000506a5965>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
               [<000000001ba5a161>] alloc_skb include/linux/skbuff.h:1058 [inline]
               [<000000001ba5a161>] alloc_skb_with_frags+0x5f/0x250  net/core/skbuff.c:5327
               [<0000000047d9c78b>] sock_alloc_send_pskb+0x269/0x2a0  net/core/sock.c:2225
               [<000000003828fe54>] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
               [<00000000e34d94f9>] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
               [<00000000de2de3fb>] sock_sendmsg_nosec net/socket.c:652 [inline]
               [<00000000de2de3fb>] sock_sendmsg+0x54/0x70 net/socket.c:671
               [<000000008fe16e7a>] __sys_sendto+0x148/0x1f0 net/socket.c:1964
      	 [...]
      
      The bug is that llc_sap_state_process() always takes an extra reference
      to the skb, but sometimes neither llc_sap_next_state() nor
      llc_sap_state_process() itself drops this reference.
      
      Fix it by changing llc_sap_next_state() to never consume a reference to
      the skb, rather than sometimes do so and sometimes not.  Then remove the
      extra skb_get() and kfree_skb() from llc_sap_state_process().
      
      Reported-by: syzbot+6bf095f9becf5efef645@syzkaller.appspotmail.com
      Reported-by: syzbot+31c16aa4202dace3812e@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f3f7409
    • S
      batman-adv: Avoid free/alloc race when handling OGM buffer · 948e8eba
      Sven Eckelmann 提交于
      commit 40e220b4218bb3d278e5e8cc04ccdfd1c7ff8307 upstream.
      
      Each slave interface of an B.A.T.M.A.N. IV virtual interface has an OGM
      packet buffer which is initialized using data from netdevice notifier and
      other rtnetlink related hooks. It is sent regularly via various slave
      interfaces of the batadv virtual interface and in this process also
      modified (realloced) to integrate additional state information via TVLV
      containers.
      
      It must be avoided that the worker item is executed without a common lock
      with the netdevice notifier/rtnetlink helpers. Otherwise it can either
      happen that half modified/freed data is sent out or functions modifying the
      OGM buffer try to access already freed memory regions.
      
      Reported-by: syzbot+0cc629f19ccb8534935b@syzkaller.appspotmail.com
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      948e8eba
    • M
      nl80211: fix validation of mesh path nexthop · be87ee68
      Markus Theil 提交于
      commit 1fab1b89e2e8f01204a9c05a39fd0b6411a48593 upstream.
      
      Mesh path nexthop should be a ethernet address, but current validation
      checks against 4 byte integers.
      
      Cc: stable@vger.kernel.org
      Fixes: 2ec600d6 ("nl80211/cfg80211: support for mesh, sta dumping")
      Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de>
      Link: https://lore.kernel.org/r/20191029093003.10355-1-markus.theil@tu-ilmenau.deSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be87ee68
    • S
      netfilter: ipset: Make invalid MAC address checks consistent · d8187ff3
      Stefano Brivio 提交于
      [ Upstream commit 29edbc3ebdb0faa934114f14bf12fc0b784d4f1b ]
      
      Set types bitmap:ipmac and hash:ipmac check that MAC addresses
      are not all zeroes.
      
      Introduce one missing check, and make the remaining ones
      consistent, using is_zero_ether_addr() instead of comparing
      against an array containing zeroes.
      
      This was already done for hash:mac sets in commit 26c97c5d
      ("netfilter: ipset: Use is_zero_ether_addr instead of static and
      memcmp").
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d8187ff3
  3. 29 10月, 2019 10 次提交
    • W
      mac80211: Reject malformed SSID elements · 24ca6289
      Will Deacon 提交于
      commit 4152561f5da3fca92af7179dd538ea89e248f9d0 upstream.
      
      Although this shouldn't occur in practice, it's a good idea to bounds
      check the length field of the SSID element prior to using it for things
      like allocations or memcpy operations.
      
      Cc: <stable@vger.kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reported-by: NNicolas Waisman <nico@semmle.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20191004095132.15777-1-will@kernel.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24ca6289
    • W
      cfg80211: wext: avoid copying malformed SSIDs · 73c066a9
      Will Deacon 提交于
      commit 4ac2813cc867ae563a1ba5a9414bfb554e5796fa upstream.
      
      Ensure the SSID element is bounds-checked prior to invoking memcpy()
      with its length field, when copying to userspace.
      
      Cc: <stable@vger.kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reported-by: NNicolas Waisman <nico@semmle.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20191004095132.15777-2-will@kernel.org
      [adjust commit log a bit]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73c066a9
    • X
      sctp: change sctp_prot .no_autobind with true · 2770f80a
      Xin Long 提交于
      [ Upstream commit 63dfb7938b13fa2c2fbcb45f34d065769eb09414 ]
      
      syzbot reported a memory leak:
      
        BUG: memory leak, unreferenced object 0xffff888120b3d380 (size 64):
        backtrace:
      
          [...] slab_alloc mm/slab.c:3319 [inline]
          [...] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
          [...] sctp_bucket_create net/sctp/socket.c:8523 [inline]
          [...] sctp_get_port_local+0x189/0x5a0 net/sctp/socket.c:8270
          [...] sctp_do_bind+0xcc/0x200 net/sctp/socket.c:402
          [...] sctp_bindx_add+0x4b/0xd0 net/sctp/socket.c:497
          [...] sctp_setsockopt_bindx+0x156/0x1b0 net/sctp/socket.c:1022
          [...] sctp_setsockopt net/sctp/socket.c:4641 [inline]
          [...] sctp_setsockopt+0xaea/0x2dc0 net/sctp/socket.c:4611
          [...] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3147
          [...] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
          [...] __do_sys_setsockopt net/socket.c:2100 [inline]
      
      It was caused by when sending msgs without binding a port, in the path:
      inet_sendmsg() -> inet_send_prepare() -> inet_autobind() ->
      .get_port/sctp_get_port(), sp->bind_hash will be set while bp->port is
      not. Later when binding another port by sctp_setsockopt_bindx(), a new
      bucket will be created as bp->port is not set.
      
      sctp's autobind is supposed to call sctp_autobind() where it does all
      things including setting bp->port. Since sctp_autobind() is called in
      sctp_sendmsg() if the sk is not yet bound, it should have skipped the
      auto bind.
      
      THis patch is to avoid calling inet_autobind() in inet_send_prepare()
      by changing sctp_prot .no_autobind with true, also remove the unused
      .get_port.
      
      Reported-by: syzbot+d44f7bbebdea49dbc84a@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2770f80a
    • X
      net: ipv6: fix listify ip6_rcv_finish in case of forwarding · da4f0aed
      Xin Long 提交于
      [ Upstream commit c7a42eb49212f93a800560662d17d5293960d3c3 ]
      
      We need a similar fix for ipv6 as Commit 0761680d ("net: ipv4: fix
      listify ip_rcv_finish in case of forwarding") does for ipv4.
      
      This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net:
      use listified RX for handling GRO_NORMAL skbs") on net-next. The call
      trace was:
      
        kernel BUG at include/linux/skbuff.h:2225!
        RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
        RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
        Call Trace:
          sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
          sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
          sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
          sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
          sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
          ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
          ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
          NF_HOOK include/linux/netfilter.h:305 [inline]
          NF_HOOK include/linux/netfilter.h:299 [inline]
          ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
          dst_input include/net/dst.h:442 [inline]
          ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
          ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
          ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
          ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
          __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
          __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
          __netif_receive_skb_list net/core/dev.c:5149 [inline]
          netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
          gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
          gro_normal_list net/core/dev.c:5755 [inline]
          gro_normal_one net/core/dev.c:5769 [inline]
          napi_frags_finish net/core/dev.c:5782 [inline]
          napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
          tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
          tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
      
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
      Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
      Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da4f0aed
    • E
      net: avoid potential infinite loop in tc_ctl_action() · 16d67aca
      Eric Dumazet 提交于
      [ Upstream commit 39f13ea2f61b439ebe0060393e9c39925c9ee28c ]
      
      tc_ctl_action() has the ability to loop forever if tcf_action_add()
      returns -EAGAIN.
      
      This special case has been done in case a module needed to be loaded,
      but it turns out that tcf_add_notify() could also return -EAGAIN
      if the socket sk_rcvbuf limit is hit.
      
      We need to separate the two cases, and only loop for the module
      loading case.
      
      While we are at it, add a limit of 10 attempts since unbounded
      loops are always scary.
      
      syzbot repro was something like :
      
      socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
      write(3, ..., 38) = 38
      setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
      sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)
      
      NMI backtrace for cpu 0
      CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
       nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
       arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
       trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
       check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
       watchdog+0x9d0/0xef0 kernel/hung_task.c:289
       kthread+0x361/0x430 kernel/kthread.c:255
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
      RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
      Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 <48> 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
      RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
      RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
      RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
      RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
      R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
      R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
      FS:  00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
       _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
       __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
       __wake_up+0xe/0x10 kernel/sched/wait.c:142
       netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
       netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
       netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
       netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
       rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
       tcf_add_notify net/sched/act_api.c:1343 [inline]
       tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
       tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440939
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16d67aca
    • S
      ipv4: Return -ENETUNREACH if we can't create route but saddr is valid · 2fa80e64
      Stefano Brivio 提交于
      [ Upstream commit 595e0651d0296bad2491a4a29a7a43eae6328b02 ]
      
      ...instead of -EINVAL. An issue was found with older kernel versions
      while unplugging a NFS client with pending RPCs, and the wrong error
      code here prevented it from recovering once link is back up with a
      configured address.
      
      Incidentally, this is not an issue anymore since commit 4f8943f80883
      ("SUNRPC: Replace direct task wakeups from softirq context"), included
      in 5.2-rc7, had the effect of decoupling the forwarding of this error
      by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin
      Coddington.
      
      To the best of my knowledge, this isn't currently causing any further
      issue, but the error code doesn't look appropriate anyway, and we
      might hit this in other paths as well.
      
      In detail, as analysed by Gonzalo Siero, once the route is deleted
      because the interface is down, and can't be resolved and we return
      -EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(),
      as the socket error seen by tcp_write_err(), called by
      tcp_retransmit_timer().
      
      In turn, tcp_write_err() indirectly calls xs_error_report(), which
      wakes up the RPC pending tasks with a status of -EINVAL. This is then
      seen by call_status() in the SUN RPC implementation, which aborts the
      RPC call calling rpc_exit(), instead of handling this as a
      potentially temporary condition, i.e. as a timeout.
      
      Return -EINVAL only if the input parameters passed to
      ip_route_output_key_hash_rcu() are actually invalid (this is the case
      if the specified source address is multicast, limited broadcast or
      all zeroes), but return -ENETUNREACH in all cases where, at the given
      moment, the given source address doesn't allow resolving the route.
      
      While at it, drop the initialisation of err to -ENETUNREACH, which
      was added to __ip_route_output_key() back then by commit
      0315e382 ("net: Fix behaviour of unreachable, blackhole and
      prohibit routes"), but actually had no effect, as it was, and is,
      overwritten by the fib_lookup() return code assignment, and anyway
      ignored in all other branches, including the if (fl4->saddr) one:
      I find this rather confusing, as it would look like -ENETUNREACH is
      the "default" error, while that statement has no effect.
      
      Also note that after commit fc75fc83 ("ipv4: dont create routes
      on down devices"), we would get -ENETUNREACH if the device is down,
      but -EINVAL if the source address is specified and we can't resolve
      the route, and this appears to be rather inconsistent.
      Reported-by: NStefan Walter <walteste@inf.ethz.ch>
      Analysed-by: NBenjamin Coddington <bcodding@redhat.com>
      Analysed-by: NGonzalo Siero <gsierohu@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2fa80e64
    • W
      ipv4: fix race condition between route lookup and invalidation · 2ec0df4e
      Wei Wang 提交于
      [ Upstream commit 5018c59607a511cdee743b629c76206d9c9e6d7b ]
      
      Jesse and Ido reported the following race condition:
      <CPU A, t0> - Received packet A is forwarded and cached dst entry is
      taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()
      
      <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
      from multiple ISPs"), route is added / deleted and rt_cache_flush() is
      called
      
      <CPU B, t2> - Received packet B tries to use the same cached dst entry
      from t0, but rt_cache_valid() is no longer true and it is replaced in
      rt_cache_route() by the newer one. This calls dst_dev_put() on the
      original dst entry which assigns the blackhole netdev to 'dst->dev'
      
      <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
      to 'dst->dev' being the blackhole netdev
      
      There are 2 issues in the v4 routing code:
      1. A per-netns counter is used to do the validation of the route. That
      means whenever a route is changed in the netns, users of all routes in
      the netns needs to redo lookup. v6 has an implementation of only
      updating fn_sernum for routes that are affected.
      2. When rt_cache_valid() returns false, rt_cache_route() is called to
      throw away the current cache, and create a new one. This seems
      unnecessary because as long as this route does not change, the route
      cache does not need to be recreated.
      
      To fully solve the above 2 issues, it probably needs quite some code
      changes and requires careful testing, and does not suite for net branch.
      
      So this patch only tries to add the deleted cached rt into the uncached
      list, so user could still be able to use it to receive packets until
      it's done.
      
      Fixes: 95c47f9c ("ipv4: call dst_dev_put() properly")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Reported-by: NJesse Hathaway <jesse@mbuki-mvuki.org>
      Tested-by: NJesse Hathaway <jesse@mbuki-mvuki.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ec0df4e
    • P
      netfilter: nft_connlimit: disable bh on garbage collection · a16a9c10
      Pablo Neira Ayuso 提交于
      [ Upstream commit 34a4c95abd25ab41fb390b985a08a651b1fa0b0f ]
      
      BH must be disabled when invoking nf_conncount_gc_list() to perform
      garbage collection, otherwise deadlock might happen.
      
        nf_conncount_add+0x1f/0x50 [nf_conncount]
        nft_connlimit_eval+0x4c/0xe0 [nft_connlimit]
        nft_dynset_eval+0xb5/0x100 [nf_tables]
        nft_do_chain+0xea/0x420 [nf_tables]
        ? sch_direct_xmit+0x111/0x360
        ? noqueue_init+0x10/0x10
        ? __qdisc_run+0x84/0x510
        ? tcp_packet+0x655/0x1610 [nf_conntrack]
        ? ip_finish_output2+0x1a7/0x430
        ? tcp_error+0x130/0x150 [nf_conntrack]
        ? nf_conntrack_in+0x1fc/0x4c0 [nf_conntrack]
        nft_do_chain_ipv4+0x66/0x80 [nf_tables]
        nf_hook_slow+0x44/0xc0
        ip_rcv+0xb5/0xd0
        ? ip_rcv_finish_core.isra.19+0x360/0x360
        __netif_receive_skb_one_core+0x52/0x70
        netif_receive_skb_internal+0x34/0xe0
        napi_gro_receive+0xba/0xe0
        e1000_clean_rx_irq+0x1e9/0x420 [e1000e]
        e1000e_poll+0xbe/0x290 [e1000e]
        net_rx_action+0x149/0x3b0
        __do_softirq+0xde/0x2d8
        irq_exit+0xba/0xc0
        do_IRQ+0x85/0xd0
        common_interrupt+0xf/0xf
        </IRQ>
        RIP: 0010:nf_conncount_gc_list+0x3b/0x130 [nf_conncount]
      
      Fixes: 2f971a8f4255 ("netfilter: nf_conncount: move all list iterations under spinlock")
      Reported-by: NLaura Garcia Liebana <nevola@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a16a9c10
    • M
      mac80211: fix txq null pointer dereference · 13104599
      Miaoqing Pan 提交于
      [ Upstream commit 8ed31a264065ae92058ce54aa3cc8da8d81dc6d7 ]
      
      If the interface type is P2P_DEVICE or NAN, read the file of
      '/sys/kernel/debug/ieee80211/phyx/netdev:wlanx/aqm' will get a
      NULL pointer dereference. As for those interface type, the
      pointer sdata->vif.txq is NULL.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000011
      CPU: 1 PID: 30936 Comm: cat Not tainted 4.14.104 #1
      task: ffffffc0337e4880 task.stack: ffffff800cd20000
      PC is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
      LR is at ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
      [...]
      Process cat (pid: 30936, stack limit = 0xffffff800cd20000)
      [...]
      [<ffffff8000b7cd00>] ieee80211_if_fmt_aqm+0x34/0xa0 [mac80211]
      [<ffffff8000b7c414>] ieee80211_if_read+0x60/0xbc [mac80211]
      [<ffffff8000b7ccc4>] ieee80211_if_read_aqm+0x28/0x30 [mac80211]
      [<ffffff80082eff94>] full_proxy_read+0x2c/0x48
      [<ffffff80081eef00>] __vfs_read+0x2c/0xd4
      [<ffffff80081ef084>] vfs_read+0x8c/0x108
      [<ffffff80081ef494>] SyS_read+0x40/0x7c
      Signed-off-by: NMiaoqing Pan <miaoqing@codeaurora.org>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/1569549796-8223-1-git-send-email-miaoqing@codeaurora.org
      [trim useless data from commit message]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      13104599
    • M
      nl80211: fix null pointer dereference · 09c5a5bb
      Miaoqing Pan 提交于
      [ Upstream commit b501426cf86e70649c983c52f4c823b3c40d72a3 ]
      
      If the interface is not in MESH mode, the command 'iw wlanx mpath del'
      will cause kernel panic.
      
      The root cause is null pointer access in mpp_flush_by_proxy(), as the
      pointer 'sdata->u.mesh.mpp_paths' is NULL for non MESH interface.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000068
      [...]
      PC is at _raw_spin_lock_bh+0x20/0x5c
      LR is at mesh_path_del+0x1c/0x17c [mac80211]
      [...]
      Process iw (pid: 4537, stack limit = 0xd83e0238)
      [...]
      [<c021211c>] (_raw_spin_lock_bh) from [<bf8c7648>] (mesh_path_del+0x1c/0x17c [mac80211])
      [<bf8c7648>] (mesh_path_del [mac80211]) from [<bf6cdb7c>] (extack_doit+0x20/0x68 [compat])
      [<bf6cdb7c>] (extack_doit [compat]) from [<c05c309c>] (genl_rcv_msg+0x274/0x30c)
      [<c05c309c>] (genl_rcv_msg) from [<c05c25d8>] (netlink_rcv_skb+0x58/0xac)
      [<c05c25d8>] (netlink_rcv_skb) from [<c05c2e14>] (genl_rcv+0x20/0x34)
      [<c05c2e14>] (genl_rcv) from [<c05c1f90>] (netlink_unicast+0x11c/0x204)
      [<c05c1f90>] (netlink_unicast) from [<c05c2420>] (netlink_sendmsg+0x30c/0x370)
      [<c05c2420>] (netlink_sendmsg) from [<c05886d0>] (sock_sendmsg+0x70/0x84)
      [<c05886d0>] (sock_sendmsg) from [<c0589f4c>] (___sys_sendmsg.part.3+0x188/0x228)
      [<c0589f4c>] (___sys_sendmsg.part.3) from [<c058add4>] (__sys_sendmsg+0x4c/0x70)
      [<c058add4>] (__sys_sendmsg) from [<c0208c80>] (ret_fast_syscall+0x0/0x44)
      Code: e2822c02 e2822001 e5832004 f590f000 (e1902f9f)
      ---[ end trace bbd717600f8f884d ]---
      Signed-off-by: NMiaoqing Pan <miaoqing@codeaurora.org>
      Link: https://lore.kernel.org/r/1569485810-761-1-git-send-email-miaoqing@codeaurora.org
      [trim useless data from commit message]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      09c5a5bb
  4. 12 10月, 2019 5 次提交
    • J
      nl80211: validate beacon head · 1bd17a73
      Johannes Berg 提交于
      commit f88eb7c0d002a67ef31aeb7850b42ff69abc46dc upstream.
      
      We currently don't validate the beacon head, i.e. the header,
      fixed part and elements that are to go in front of the TIM
      element. This means that the variable elements there can be
      malformed, e.g. have a length exceeding the buffer size, but
      most downstream code from this assumes that this has already
      been checked.
      
      Add the necessary checks to the netlink policy.
      
      Cc: stable@vger.kernel.org
      Fixes: ed1b6cc7 ("cfg80211/nl80211: add beacon settings")
      Link: https://lore.kernel.org/r/1569009255-I7ac7fbe9436e9d8733439eab8acbbd35e55c74ef@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd17a73
    • J
      cfg80211: add and use strongly typed element iteration macros · ad180cac
      Johannes Berg 提交于
      commit 0f3b07f027f87a38ebe5c436490095df762819be upstream.
      
      Rather than always iterating elements from frames with pure
      u8 pointers, add a type "struct element" that encapsulates
      the id/datalen/data format of them.
      
      Then, add the element iteration macros
       * for_each_element
       * for_each_element_id
       * for_each_element_extid
      
      which take, as their first 'argument', such a structure and
      iterate through a given u8 array interpreting it as elements.
      
      While at it and since we'll need it, also add
       * for_each_subelement
       * for_each_subelement_id
       * for_each_subelement_extid
      
      which instead of taking data/length just take an outer element
      and use its data/datalen.
      
      Also add for_each_element_completed() to determine if any of
      the loops above completed, i.e. it was able to parse all of
      the elements successfully and no data remained.
      
      Use for_each_element_id() in cfg80211_find_ie_match() as the
      first user of this.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad180cac
    • F
      netfilter: nf_tables: allow lookups in dynamic sets · f7ace7f2
      Florian Westphal 提交于
      [ Upstream commit acab713177377d9e0889c46bac7ff0cfb9a90c4d ]
      
      This un-breaks lookups in sets that have the 'dynamic' flag set.
      Given this active example configuration:
      
      table filter {
        set set1 {
          type ipv4_addr
          size 64
          flags dynamic,timeout
          timeout 1m
        }
      
        chain input {
           type filter hook input priority 0; policy accept;
        }
      }
      
      ... this works:
      nft add rule ip filter input add @set1 { ip saddr }
      
      -> whenever rule is triggered, the source ip address is inserted
      into the set (if it did not exist).
      
      This won't work:
      nft add rule ip filter input ip saddr @set1 counter
      Error: Could not process rule: Operation not supported
      
      In other words, we can add entries to the set, but then can't make
      matching decision based on that set.
      
      That is just wrong -- all set backends support lookups (else they would
      not be very useful).
      The failure comes from an explicit rejection in nft_lookup.c.
      
      Looking at the history, it seems like NFT_SET_EVAL used to mean
      'set contains expressions' (aka. "is a meter"), for instance something like
      
       nft add rule ip filter input meter example { ip saddr limit rate 10/second }
       or
       nft add rule ip filter input meter example { ip saddr counter }
      
      The actual meaning of NFT_SET_EVAL however, is
      'set can be updated from the packet path'.
      
      'meters' and packet-path insertions into sets, such as
      'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
      and thus require a set backend that provides the ->update() function.
      
      The only set that provides this also is the only one that has the
      NFT_SET_EVAL feature flag.
      
      Removing the wrong check makes the above example work.
      While at it, also fix the flag check during set instantiation to
      allow supported combinations only.
      
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f7ace7f2
    • L
      9p: Transport error uninitialized · 07f3596c
      Lu Shuaibing 提交于
      [ Upstream commit 0ce772fe79b68f83df40f07f28207b292785c677 ]
      
      The p9_tag_alloc() does not initialize the transport error t_err field.
      The struct p9_req_t *req is allocated and stored in a struct p9_client
      variable. The field t_err is never initialized before p9_conn_cancel()
      checks its value.
      
      KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
      reports this bug.
      
      ==================================================================
      BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0
      Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
      
      CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      Workqueue: events p9_write_work
      Call Trace:
       dump_stack+0x75/0xae
       __kumsan_report+0x17c/0x3e6
       kumsan_report+0xe/0x20
       p9_conn_cancel+0x2d9/0x3b0
       p9_write_work+0x183/0x4a0
       process_one_work+0x4d1/0x8c0
       worker_thread+0x6e/0x780
       kthread+0x1ca/0x1f0
       ret_from_fork+0x35/0x40
      
      Allocated by task 1979:
       save_stack+0x19/0x80
       __kumsan_kmalloc.constprop.3+0xbc/0x120
       kmem_cache_alloc+0xa7/0x170
       p9_client_prepare_req.part.9+0x3b/0x380
       p9_client_rpc+0x15e/0x880
       p9_client_create+0x3d0/0xac0
       v9fs_session_init+0x192/0xc80
       v9fs_mount+0x67/0x470
       legacy_get_tree+0x70/0xd0
       vfs_get_tree+0x4a/0x1c0
       do_mount+0xba9/0xf90
       ksys_mount+0xa8/0x120
       __x64_sys_mount+0x62/0x70
       do_syscall_64+0x6d/0x1e0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 0:
      (stack is not available)
      
      The buggy address belongs to the object at ffff88805f9b6008
       which belongs to the cache p9_req_t of size 144
      The buggy address is located 4 bytes inside of
       144-byte region [ffff88805f9b6008, ffff88805f9b6098)
      The buggy address belongs to the page:
      page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0
      flags: 0x100000000010200(slab|head)
      raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740
      raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000
      page dumped because: kumsan: bad access detected
      ==================================================================
      
      Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.comSigned-off-by: NLu Shuaibing <shuaibinglu@126.com>
      [dominique.martinet@cea.fr: grouped the added init with the others]
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      07f3596c
    • J
      cfg80211: initialize on-stack chandefs · 3a0e6733
      Johannes Berg 提交于
      commit f43e5210c739fe76a4b0ed851559d6902f20ceb1 upstream.
      
      In a few places we don't properly initialize on-stack chandefs,
      resulting in EDMG data to be non-zero, which broke things.
      
      Additionally, in a few places we rely on the driver to init the
      data completely, but perhaps we shouldn't as non-EDMG drivers
      may not initialize the EDMG data, also initialize it there.
      
      Cc: stable@vger.kernel.org
      Fixes: 2a38075cd0be ("nl80211: Add support for EDMG channels")
      Reported-by: NDmitry Osipenko <digetx@gmail.com>
      Tested-by: NDmitry Osipenko <digetx@gmail.com>
      Link: https://lore.kernel.org/r/1569239475-I2dcce394ecf873376c386a78f31c2ec8b538fa25@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0e6733
  5. 08 10月, 2019 6 次提交
    • A
      NFC: fix attrs checks in netlink interface · c8a65ec0
      Andrey Konovalov 提交于
      commit 18917d51472fe3b126a3a8f756c6b18085eb8130 upstream.
      
      nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
      attribute being present, but doesn't check whether it is actually
      provided by the user. Same goes for nfc_genl_fw_download() and
      NFC_ATTR_FIRMWARE_NAME.
      
      This patch adds appropriate checks.
      
      Found with syzkaller.
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a65ec0
    • E
      sch_cbq: validate TCA_CBQ_WRROPT to avoid crash · 74e2a311
      Eric Dumazet 提交于
      [ Upstream commit e9789c7cc182484fc031fd88097eb14cb26c4596 ]
      
      syzbot reported a crash in cbq_normalize_quanta() caused
      by an out of range cl->priority.
      
      iproute2 enforces this check, but malicious users do not.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      Modules linked in:
      CPU: 1 PID: 26447 Comm: syz-executor.1 Not tainted 5.3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:cbq_normalize_quanta.part.0+0x1fd/0x430 net/sched/sch_cbq.c:902
      RSP: 0018:ffff8801a5c333b0 EFLAGS: 00010206
      RAX: 0000000020000003 RBX: 00000000fffffff8 RCX: ffffc9000712f000
      RDX: 00000000000043bf RSI: ffffffff83be8962 RDI: 0000000100000018
      RBP: ffff8801a5c33420 R08: 000000000000003a R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000002ef
      R13: ffff88018da95188 R14: dffffc0000000000 R15: 0000000000000015
      FS:  00007f37d26b1700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c7cec CR3: 00000001bcd0a006 CR4: 00000000001626f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffff83be9d57>] cbq_normalize_quanta include/net/pkt_sched.h:27 [inline]
       [<ffffffff83be9d57>] cbq_addprio net/sched/sch_cbq.c:1097 [inline]
       [<ffffffff83be9d57>] cbq_set_wrr+0x2d7/0x450 net/sched/sch_cbq.c:1115
       [<ffffffff83bee8a7>] cbq_change_class+0x987/0x225b net/sched/sch_cbq.c:1537
       [<ffffffff83b96985>] tc_ctl_tclass+0x555/0xcd0 net/sched/sch_api.c:2329
       [<ffffffff83a84655>] rtnetlink_rcv_msg+0x485/0xc10 net/core/rtnetlink.c:5248
       [<ffffffff83cadf0a>] netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2510
       [<ffffffff83a7db6d>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5266
       [<ffffffff83cac2c6>] netlink_unicast_kernel net/netlink/af_netlink.c:1324 [inline]
       [<ffffffff83cac2c6>] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1350
       [<ffffffff83cacd4a>] netlink_sendmsg+0x89a/0xd50 net/netlink/af_netlink.c:1939
       [<ffffffff8399d46e>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff8399d46e>] sock_sendmsg+0x12e/0x170 net/socket.c:684
       [<ffffffff8399f1fd>] ___sys_sendmsg+0x81d/0x960 net/socket.c:2359
       [<ffffffff839a2d05>] __sys_sendmsg+0x105/0x1d0 net/socket.c:2397
       [<ffffffff839a2df9>] SYSC_sendmsg net/socket.c:2406 [inline]
       [<ffffffff839a2df9>] SyS_sendmsg+0x29/0x30 net/socket.c:2404
       [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
       [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74e2a311
    • T
      tipc: fix unlimited bundling of small messages · ed9420dd
      Tuong Lien 提交于
      [ Upstream commit e95584a889e1902fdf1ded9712e2c3c3083baf96 ]
      
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: NHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9420dd
    • D
      net/rds: Fix error handling in rds_ib_add_one() · 36a4043c
      Dotan Barak 提交于
      [ Upstream commit d64bf89a75b65f83f06be9fb8f978e60d53752db ]
      
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4043c
    • J
      udp: only do GSO if # of segs > 1 · 012363f5
      Josh Hunt 提交于
      [ Upstream commit 4094871db1d65810acab3d57f6089aa39ef7f648 ]
      
      Prior to this change an application sending <= 1MSS worth of data and
      enabling UDP GSO would fail if the system had SW GSO enabled, but the
      same send would succeed if HW GSO offload is enabled. In addition to this
      inconsistency the error in the SW GSO case does not get back to the
      application if sending out of a real device so the user is unaware of this
      failure.
      
      With this change we only perform GSO if the # of segments is > 1 even
      if the application has enabled segmentation. I've also updated the
      relevant udpgso selftests.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      012363f5
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui 提交于
      [ Upstream commit 0d9138ffac24cf8b75366ede3a68c951e6dcc575 ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704