1. 29 8月, 2019 7 次提交
    • D
      rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet · 4db2043e
      David Howells 提交于
      [ Upstream commit c69565ee6681e151e2bb80502930a16e04b553d1 ]
      
      Fix the fact that a notification isn't sent to the recvmsg side to indicate
      a call failed when sendmsg() fails to transmit a DATA packet with the error
      ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.
      
      Without this notification, the afs client just sits there waiting for the
      call to complete in some manner (which it's not now going to do), which
      also pins the rxrpc call in place.
      
      This can be seen if the client has a scope-level IPv6 address, but not a
      global-level IPv6 address, and we try and transmit an operation to a
      server's IPv6 address.
      
      Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
      an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.
      
      Fixes: c54e43d7 ("rxrpc: Fix missing start of call timeout")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4db2043e
    • D
      rxrpc: Fix potential deadlock · 0d68fbc2
      David Howells 提交于
      [ Upstream commit 60034d3d146b11922ab1db613bce062dddc0327a ]
      
      There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby
      rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces
      the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which
      the tries to take the already held lock.
      
      Fix this by providing a version of rxrpc_put_peer() that can be called in
      situations where the lock is already held.
      
      The bug may produce the following lockdep report:
      
      ============================================
      WARNING: possible recursive locking detected
      5.2.0-next-20190718 #41 Not tainted
      --------------------------------------------
      kworker/0:3/21678 is trying to acquire lock:
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
      /./include/linux/spinlock.h:343 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435
      
      but task is already holding lock:
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
      /./include/linux/spinlock.h:343 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430
      
      Fixes: 330bdcfa ("rxrpc: Fix the keepalive generator [ver #2]")
      Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0d68fbc2
    • J
      netfilter: ipset: Fix rename concurrency with listing · 63dd147e
      Jozsef Kadlecsik 提交于
      [ Upstream commit 6c1f7e2c1b96ab9b09ac97c4df2bd9dc327206f6 ]
      
      Shijie Luo reported that when stress-testing ipset with multiple concurrent
      create, rename, flush, list, destroy commands, it can result
      
      ipset <version>: Broken LIST kernel message: missing DATA part!
      
      error messages and broken list results. The problem was the rename operation
      was not properly handled with respect of listing. The patch fixes the issue.
      Reported-by: NShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      63dd147e
    • S
      netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets · ea08214d
      Stefano Brivio 提交于
      [ Upstream commit 1b4a75108d5bc153daf965d334e77e8e94534f96 ]
      
      In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
      for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
      KADT functions for sets matching on MAC addreses the copy of source or
      destination MAC address depending on the configured match.
      
      This was done correctly for hash:mac, but for hash:ip,mac and
      bitmap:ip,mac, copying and pasting the same code block presents an
      obvious problem: in these two set types, the MAC address is the second
      dimension, not the first one, and we are actually selecting the MAC
      address depending on whether the first dimension (IP address) specifies
      source or destination.
      
      Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
      
      This way, mixing source and destination matches for the two dimensions
      of ip,mac set types works as expected. With this setup:
      
        ip netns add A
        ip link add veth1 type veth peer name veth2 netns A
        ip addr add 192.0.2.1/24 dev veth1
        ip -net A addr add 192.0.2.2/24 dev veth2
        ip link set veth1 up
        ip -net A link set veth2 up
      
        dst=$(ip netns exec A cat /sys/class/net/veth2/address)
      
        ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
        ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
        ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
      
        ip netns exec A ipset create test_hash hash:ip,mac
        ip netns exec A ipset add test_hash 192.0.2.1,${dst}
        ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
      
      ipset correctly matches a test packet:
      
        # ping -c1 192.0.2.2 >/dev/null
        # echo $?
        0
      Reported-by: NChen Yi <yiche@redhat.com>
      Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea08214d
    • S
      netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too · 5a072ef6
      Stefano Brivio 提交于
      [ Upstream commit b89d15480d0cacacae1a0fe0b3da01b529f2914f ]
      
      In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
      for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
      KADT check that prevents matching on destination MAC addresses for
      hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
      
      Drop this check: functionality is now commented in man pages and there's
      no reason to restrict to source MAC address matching anymore.
      Reported-by: NChen Yi <yiche@redhat.com>
      Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5a072ef6
    • Y
      can: gw: Fix error path of cgw_module_init · bd2f4c7c
      YueHaibing 提交于
      [ Upstream commit b7a14297f102b6e2ce6f16feffebbb9bde1e9b55 ]
      
      This patch add error path for cgw_module_init to avoid possible crash if
      some error occurs.
      
      Fixes: c1aabdf3 ("can-gw: add netlink based CAN routing")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bd2f4c7c
    • W
      netfilter: ebtables: fix a memory leak bug in compat · 71305e8e
      Wenwen Wang 提交于
      [ Upstream commit 15a78ba1844a8e052c1226f930133de4cef4e7ad ]
      
      In compat_do_replace(), a temporary buffer is allocated through vmalloc()
      to hold entries copied from the user space. The buffer address is firstly
      saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then
      the entries in this temporary buffer is copied to the internal kernel
      structure through compat_copy_entries(). If this copy process fails,
      compat_do_replace() should be terminated. However, the allocated temporary
      buffer is not freed on this path, leading to a memory leak.
      
      To fix the bug, free the buffer before returning from compat_do_replace().
      Signed-off-by: NWenwen Wang <wenwen@cs.uga.edu>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      71305e8e
  2. 25 8月, 2019 7 次提交
    • C
      tipc: initialise addr_trail_end when setting node addresses · cc4ff0f4
      Chris Packham 提交于
      [ Upstream commit 8874ecae2977e5a2d4f0ba301364435b81c05938 ]
      
      We set the field 'addr_trial_end' to 'jiffies', instead of the current
      value 0, at the moment the node address is initialized. This guarantees
      we don't inadvertently enter an address trial period when the node
      address is explicitly set by the user.
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc4ff0f4
    • X
      sctp: fix the transport error_count check · eeb148d2
      Xin Long 提交于
      [ Upstream commit a1794de8b92ea6bc2037f445b296814ac826693e ]
      
      As the annotation says in sctp_do_8_2_transport_strike():
      
        "If the transport error count is greater than the pf_retrans
         threshold, and less than pathmaxrtx ..."
      
      It should be transport->error_count checked with pathmaxrxt,
      instead of asoc->pf_retrans.
      
      Fixes: 5aa93bcf ("sctp: Implement quick failover draft from tsvwg")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eeb148d2
    • Z
      sctp: fix memleak in sctp_send_reset_streams · 227f204a
      zhengbin 提交于
      [ Upstream commit 6d5afe20397b478192ed8c38ec0ee10fa3aec649 ]
      
      If the stream outq is not empty, need to kfree nstr_list.
      
      Fixes: d570a59c ("sctp: only allow the out stream reset when the stream outq is empty")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      227f204a
    • E
      net/packet: fix race in tpacket_snd() · 154e6bc4
      Eric Dumazet 提交于
      [ Upstream commit 32d3182cd2cd29b2e7e04df7b0db350fbe11289f ]
      
      packet_sendmsg() checks tx_ring.pg_vec to decide
      if it must call tpacket_snd().
      
      Problem is that the check is lockless, meaning another thread
      can issue a concurrent setsockopt(PACKET_TX_RING ) to flip
      tx_ring.pg_vec back to NULL.
      
      Given that tpacket_snd() grabs pg_vec_lock mutex, we can
      perform the check again to solve the race.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474
      Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00
      RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c
      RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50
      R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000
      R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000
      FS:  00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       packet_current_frame net/packet/af_packet.c:487 [inline]
       tpacket_snd net/packet/af_packet.c:2667 [inline]
       packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      154e6bc4
    • C
      net: dsa: Check existence of .port_mdb_add callback before calling it · 8905a249
      Chen-Yu Tsai 提交于
      [ Upstream commit 58799865be84e2a895dab72de0e1b996ed943f22 ]
      
      The dsa framework has optional .port_mdb_{prepare,add,del} callback fields
      for drivers to handle multicast database entries. When adding an entry, the
      framework goes through a prepare phase, then a commit phase. Drivers not
      providing these callbacks should be detected in the prepare phase.
      
      DSA core may still bypass the bridge layer and call the dsa_port_mdb_add
      function directly with no prepare phase or no switchdev trans object,
      and the framework ends up calling an undefined .port_mdb_add callback.
      This results in a NULL pointer dereference, as shown in the log below.
      
      The other functions seem to be properly guarded. Do the same for
      .port_mdb_add in dsa_switch_mdb_add_bitmap() as well.
      
          8<--- cut here ---
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          pgd = (ptrval)
          [00000000] *pgd=00000000
          Internal error: Oops: 80000005 [#1] SMP ARM
          Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
          CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
          Hardware name: Allwinner sun7i (A20) Family
          Workqueue: events switchdev_deferred_process_work
          PC is at 0x0
          LR is at dsa_switch_event+0x570/0x620
          pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
          sp : ee871db8  ip : 00000000  fp : ee98d0a4
          r10: 0000000c  r9 : 00000008  r8 : ee89f710
          r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
          r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
          Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
          Control: 10c5387d  Table: 6deb406a  DAC: 00000051
          Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
          Stack: (0xee871db8 to 0xee872000)
          1da0:                                                       ee871e14 103ace2d
          1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
          1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
          1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
          1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
          1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
          1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
          1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
          1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
          1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
          1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
          1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
          1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
          1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
          1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
          1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
          1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
          [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
          [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
          [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
          [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
          [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
          [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
          [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
          [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
          [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
          [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
          [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
          [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
          [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
          [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
          Exception stack(0xee871fb0 to 0xee871ff8)
          1fa0:                                     00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
          Code: bad PC value
          ---[ end trace 1292c61abd17b130 ]---
      
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          corresponds to
      
      	$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec
      
      	linux/net/dsa/switch.c:156
      	linux/net/dsa/switch.c:178
      	linux/net/dsa/switch.c:328
      
      Fixes: e6db98db ("net: dsa: add switch mdb bitmap functions")
      Signed-off-by: NChen-Yu Tsai <wens@csie.org>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8905a249
    • D
      netfilter: conntrack: Use consistent ct id hash calculation · 28ff7d3b
      Dirk Morris 提交于
      commit 656c8e9cc1badbc18eefe6ba01d33ebbcae61b9a upstream.
      
      Change ct id hash calculation to only use invariants.
      
      Currently the ct id hash calculation is based on some fields that can
      change in the lifetime on a conntrack entry in some corner cases. The
      current hash uses the whole tuple which contains an hlist pointer which
      will change when the conntrack is placed on the dying list resulting in
      a ct id change.
      
      This patch also removes the reply-side tuple and extension pointer from
      the hash calculation so that the ct id will will not change from
      initialization until confirmation.
      
      Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
      Signed-off-by: NDirk Morris <dmorris@metaloft.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28ff7d3b
    • F
      netfilter: ebtables: also count base chain policies · cef0e9eb
      Florian Westphal 提交于
      commit 3b48300d5cc7c7bed63fddb006c4046549ed4aec upstream.
      
      ebtables doesn't include the base chain policies in the rule count,
      so we need to add them manually when we call into the x_tables core
      to allocate space for the comapt offset table.
      
      This lead syzbot to trigger:
      WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
      xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649
      
      Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
      Fixes: 2035f3ff8eaa ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cef0e9eb
  3. 16 8月, 2019 6 次提交
  4. 09 8月, 2019 13 次提交
    • A
      compat_ioctl: pppoe: fix PPPOEIOCSFWD handling · e6e9bcef
      Arnd Bergmann 提交于
      [ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ]
      
      Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
      linux-2.5.69 along with hundreds of other commands, but was always broken
      sincen only the structure is compatible, but the command number is not,
      due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
      sockaddr_pppox)), which is different on 64-bit architectures.
      
      Guillaume Nault adds:
      
        And the implementation was broken until 2016 (see 29e73269 ("pppoe:
        fix reference counting in PPPoE proxy")), and nobody ever noticed. I
        should probably have removed this ioctl entirely instead of fixing it.
        Clearly, it has never been used.
      
      Fix it by adding a compat_ioctl handler for all pppoe variants that
      translates the command number and then calls the regular ioctl function.
      
      All other ioctl commands handled by pppoe are compatible between 32-bit
      and 64-bit, and require compat_ptr() conversion.
      
      This should apply to all stable kernels.
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6e9bcef
    • T
      tipc: compat: allow tipc commands without arguments · 5295d651
      Taras Kondratiuk 提交于
      [ Upstream commit 4da5f0018eef4c0de31675b670c80e82e13e99d1 ]
      
      Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
      broke older tipc tools that use compat interface (e.g. tipc-config from
      tipcutils package):
      
      % tipc-config -p
      operation not supported
      
      The commit started to reject TIPC netlink compat messages that do not
      have attributes. It is too restrictive because some of such messages are
      valid (they don't need any arguments):
      
      % grep 'tx none' include/uapi/linux/tipc_config.h
      #define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
      #define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
      #define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
      #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
      #define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
      #define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */
      
      This patch relaxes the original fix and rejects messages without
      arguments only if such arguments are expected by a command (reg_type is
      non zero).
      
      Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTaras Kondratiuk <takondra@cisco.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5295d651
    • U
      net/smc: do not schedule tx_work in SMC_CLOSED state · ce58a365
      Ursula Braun 提交于
      [ Upstream commit f9cedf1a9b1cdcfb0c52edb391d01771e43994a4 ]
      
      The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
      tx worker. Make sure the socket is not yet moved into SMC_CLOSED
      state (for instance by a shutdown SHUT_RDWR call).
      
      Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
      Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
      Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce58a365
    • D
      net: sched: use temporary variable for actions indexes · 51d240a1
      Dmytro Linkin 提交于
      [ Upstream commit 7be8ef2cdbfe41a2e524b7c6cc3f8e6cfaa906e4 ]
      
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d240a1
    • R
      net sched: update vlan action for batched events operations · cb20f741
      Roman Mashak 提交于
      [ Upstream commit b35475c5491a14c8ce7a5046ef7bcda8a860581a ]
      
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: c7e2b968 ("sched: introduce vlan action")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb20f741
    • J
      net: sched: Fix a possible null-pointer dereference in dequeue_func() · d82dc254
      Jia-Ju Bai 提交于
      [ Upstream commit 051c7b39be4a91f6b7d8c4548444e4b850f1f56c ]
      
      In dequeue_func(), there is an if statement on line 74 to check whether
      skb is NULL:
          if (skb)
      
      When skb is NULL, it is used on line 77:
          prefetch(&skb->end);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, skb->end is used when skb is not NULL.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 76e3cc12 ("codel: Controlled Delay AQM")
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d82dc254
    • J
      net: fix ifindex collision during namespace removal · edb7ad69
      Jiri Pirko 提交于
      [ Upstream commit 55b40dbf0e76b4bfb9d8b3a16a0208640a9a45df ]
      
      Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions
      on net_ns stop.") introduced a possibility to hit a BUG in case device
      is returning back to init_net and two following conditions are met:
      1) dev->ifindex value is used in a name of another "dev%d"
         device in init_net.
      2) dev->name is used by another device in init_net.
      
      Under real life circumstances this is hard to get. Therefore this has
      been present happily for over 10 years. To reproduce:
      
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip netns add ns1
      $ ip -n ns1 link add dummy1ns1 type dummy
      $ ip -n ns1 link add dummy2ns1 type dummy
      $ ip link set enp0s2 netns ns1
      $ ip -n ns1 link set enp0s2 name dummy0
      [  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
      $ ip link add dev4 type dummy
      $ ip -n ns1 a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
      3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
      4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
      $ ip netns del ns1
      [  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
      [  158.719316] ------------[ cut here ]------------
      [  158.720591] kernel BUG at net/core/dev.c:9824!
      [  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
      [  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  158.727508] Workqueue: netns cleanup_net
      [  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  158.762758] Call Trace:
      [  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
      [  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.769870]  ops_exit_list.isra.0+0xa8/0x150
      [  158.771544]  cleanup_net+0x446/0x8f0
      [  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
      [  158.775294]  process_one_work+0xa1a/0x1740
      [  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
      [  158.779143]  ? do_raw_spin_lock+0x11b/0x280
      [  158.780848]  worker_thread+0x9e/0x1060
      [  158.782500]  ? process_one_work+0x1740/0x1740
      [  158.784454]  kthread+0x31b/0x420
      [  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
      [  158.788286]  ret_from_fork+0x3a/0x50
      [  158.789871] ---[ end trace defd6c657c71f936 ]---
      [  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by checking if a device with the same name exists in init_net
      and fallback to original code - dev%d to allocate name - in case it does.
      
      This was found using syzkaller.
      
      Fixes: aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edb7ad69
    • N
      net: bridge: mcast: don't delete permanent entries when fast leave is enabled · a19d4e34
      Nikolay Aleksandrov 提交于
      [ Upstream commit 5c725b6b65067909548ac9ca9bc777098ec9883d ]
      
      When permanent entries were introduced by the commit below, they were
      exempt from timing out and thus igmp leave wouldn't affect them unless
      fast leave was enabled on the port which was added before permanent
      entries existed. It shouldn't matter if fast leave is enabled or not
      if the user added a permanent entry it shouldn't be deleted on igmp
      leave.
      
      Before:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      $
      
      After:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a19d4e34
    • N
      net: bridge: delete local fdb on device init failure · 639239be
      Nikolay Aleksandrov 提交于
      [ Upstream commit d7bae09fa008c6c9a489580db0a5a12063b97f97 ]
      
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      639239be
    • H
      ipip: validate header length in ipip_tunnel_xmit · f186fb5c
      Haishuang Yan 提交于
      [ Upstream commit 47d858d0bdcd47cc1c6c9eeca91b091dd9e55637 ]
      
      We need the same checks introduced by commit cb9f1b783850
      ("ip: validate header length on virtual device xmit") for
      ipip tunnel.
      
      Fixes: cb9f1b783850b ("ip: validate header length on virtual device xmit")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f186fb5c
    • H
      ip6_tunnel: fix possible use-after-free on xmit · 1bb2dd37
      Haishuang Yan 提交于
      [ Upstream commit 01f5bffad555f8e22a61f4b1261fe09cf1b96994 ]
      
      ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
      can cause a possible use-after-free accessing iph/ipv6h pointer
      since the packet will be 'uncloned' running pskb_expand_head if
      it is a cloned gso skb.
      
      Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bb2dd37
    • H
      ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6 · fdcefa46
      Haishuang Yan 提交于
      [ Upstream commit 3bc817d665ac6d9de89f59df522ad86f5b5dfc03 ]
      
      Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
      which may change skb->data, so we need to re-load ipv6h at
      the right place.
      
      Fixes: 898b2979 ("ip6_gre: Refactor ip6gre xmit codes")
      Cc: William Tu <u9012063@gmail.com>
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdcefa46
    • C
      ife: error out when nla attributes are empty · c4c88993
      Cong Wang 提交于
      [ Upstream commit c8ec4632c6ac9cda0e8c3d51aa41eeab66585bd5 ]
      
      act_ife at least requires TCA_IFE_PARMS, so we have to bail out
      when there is no attribute passed in.
      
      Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4c88993
  5. 04 8月, 2019 3 次提交
  6. 31 7月, 2019 1 次提交
    • S
      hvsock: fix epollout hang from race condition · 49fb03de
      Sunil Muthuswamy 提交于
      [ Upstream commit cb359b60416701c8bed82fec79de25a144beb893 ]
      
      Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
      not return even when the hvsock socket is writable, under some race
      condition. This can happen under the following sequence:
      - fd = socket(hvsocket)
      - fd_out = dup(fd)
      - fd_in = dup(fd)
      - start a writer thread that writes data to fd_out with a combination of
        epoll_wait(fd_out, EPOLLOUT) and
      - start a reader thread that reads data from fd_in with a combination of
        epoll_wait(fd_in, EPOLLIN)
      - On the host, there are two threads that are reading/writing data to the
        hvsocket
      
      stack:
      hvs_stream_has_space
      hvs_notify_poll_out
      vsock_poll
      sock_poll
      ep_poll
      
      Race condition:
      check for epollout from ep_poll():
      	assume no writable space in the socket
      	hvs_stream_has_space() returns 0
      check for epollin from ep_poll():
      	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
      	hvs_stream_has_space() will clear the channel pending send size
      	host will not notify the guest because the pending send size has
      		been cleared and so the hvsocket will never mark the
      		socket writable
      
      Now, the EPOLLOUT will never return even if the socket write buffer is
      empty.
      
      The fix is to set the pending size to the default size and never change it.
      This way the host will always notify the guest whenever the writable space
      is bigger than the pending size. The host is already optimized to *only*
      notify the guest when the pending size threshold boundary is crossed and
      not everytime.
      
      This change also reduces the cpu usage somewhat since hv_stream_has_space()
      is in the hotpath of send:
      vsock_stream_sendmsg()->hv_stream_has_space()
      Earlier hv_stream_has_space was setting/clearing the pending size on every
      call.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      49fb03de
  7. 28 7月, 2019 3 次提交
    • V
      net: sched: verify that q!=NULL before setting q->flags · 60e9babf
      Vlad Buslov 提交于
      commit 503d81d428bd598430f7f9d02021634e1a8139a0 upstream.
      
      In function int tc_new_tfilter() q pointer can be NULL when adding filter
      on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
      filter creation, following NULL pointer dereference happens in case parent
      block is shared:
      
      [  212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010
      [  212.925445] #PF: supervisor write access in kernel mode
      [  212.925709] #PF: error_code(0x0002) - not-present page
      [  212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0
      [  212.926302] Oops: 0002 [#1] SMP KASAN PTI
      [  212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G    B             5.2.0+ #512
      [  212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
      [  212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
      b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
      [  212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296
      [  212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
      [  212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297
      [  212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d
      [  212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040
      [  212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680
      [  212.929999] FS:  00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000
      [  212.930235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0
      [  212.930588] Call Trace:
      [  212.930682]  ? tc_del_tfilter+0xa40/0xa40
      [  212.930811]  ? __lock_acquire+0x5b5/0x2460
      [  212.930948]  ? find_held_lock+0x85/0xa0
      [  212.931081]  ? tc_del_tfilter+0xa40/0xa40
      [  212.931201]  rtnetlink_rcv_msg+0x4ab/0x5f0
      [  212.931332]  ? rtnl_dellink+0x490/0x490
      [  212.931454]  ? lockdep_hardirqs_on+0x260/0x260
      [  212.931589]  ? netlink_deliver_tap+0xab/0x5a0
      [  212.931717]  ? match_held_lock+0x1b/0x240
      [  212.931844]  netlink_rcv_skb+0xd0/0x200
      [  212.931958]  ? rtnl_dellink+0x490/0x490
      [  212.932079]  ? netlink_ack+0x440/0x440
      [  212.932205]  ? netlink_deliver_tap+0x161/0x5a0
      [  212.932335]  ? lock_downgrade+0x360/0x360
      [  212.932457]  ? lock_acquire+0xe5/0x210
      [  212.932579]  netlink_unicast+0x296/0x350
      [  212.932705]  ? netlink_attachskb+0x390/0x390
      [  212.932834]  ? _copy_from_iter_full+0xe0/0x3a0
      [  212.932976]  netlink_sendmsg+0x394/0x600
      [  212.937998]  ? netlink_unicast+0x350/0x350
      [  212.943033]  ? move_addr_to_kernel.part.0+0x90/0x90
      [  212.948115]  ? netlink_unicast+0x350/0x350
      [  212.953185]  sock_sendmsg+0x96/0xa0
      [  212.958099]  ___sys_sendmsg+0x482/0x520
      [  212.962881]  ? match_held_lock+0x1b/0x240
      [  212.967618]  ? copy_msghdr_from_user+0x250/0x250
      [  212.972337]  ? lock_downgrade+0x360/0x360
      [  212.976973]  ? rwlock_bug.part.0+0x60/0x60
      [  212.981548]  ? __mod_node_page_state+0x1f/0xa0
      [  212.986060]  ? match_held_lock+0x1b/0x240
      [  212.990567]  ? find_held_lock+0x85/0xa0
      [  212.994989]  ? do_user_addr_fault+0x349/0x5b0
      [  212.999387]  ? lock_downgrade+0x360/0x360
      [  213.003713]  ? find_held_lock+0x85/0xa0
      [  213.007972]  ? __fget_light+0xa1/0xf0
      [  213.012143]  ? sockfd_lookup_light+0x91/0xb0
      [  213.016165]  __sys_sendmsg+0xba/0x130
      [  213.020040]  ? __sys_sendmsg_sock+0xb0/0xb0
      [  213.023870]  ? handle_mm_fault+0x337/0x470
      [  213.027592]  ? page_fault+0x8/0x30
      [  213.031316]  ? lockdep_hardirqs_off+0xbe/0x100
      [  213.034999]  ? mark_held_locks+0x24/0x90
      [  213.038671]  ? do_syscall_64+0x1e/0xe0
      [  213.042297]  do_syscall_64+0x74/0xe0
      [  213.045828]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  213.049354] RIP: 0033:0x7fe7c527c7b8
      [  213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
      0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
      [  213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8
      [  213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003
      [  213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0
      [  213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080
      [  213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000
      [  213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
      [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
      lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
      _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
      r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
      [  213.112326] CR2: 0000000000000010
      [  213.117429] ---[ end trace adb58eb0a4ee6283 ]---
      
      Verify that q pointer is not NULL before setting the 'flags' field.
      
      Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Sasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60e9babf
    • N
      net: bridge: stp: don't cache eth dest pointer before skb pull · b72fb8de
      Nikolay Aleksandrov 提交于
      [ Upstream commit 2446a68ae6a8cee6d480e2f5b52f5007c7c41312 ]
      
      Don't cache eth dest pointer before calling pskb_may_pull.
      
      Fixes: cf0f02d0 ("[BRIDGE]: use llc for receiving STP packets")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b72fb8de
    • N
      net: bridge: don't cache ether dest pointer on input · 78701843
      Nikolay Aleksandrov 提交于
      [ Upstream commit 3d26eb8ad1e9b906433903ce05f775cf038e747f ]
      
      We would cache ether dst pointer on input in br_handle_frame_finish but
      after the neigh suppress code that could lead to a stale pointer since
      both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to
      always reload it after the suppress code so there's no point in having
      it cached just retrieve it directly.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Fixes: ed842fae ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78701843