1. 31 5月, 2020 5 次提交
    • P
      mptcp: fix race between MP_JOIN and close · 10f6d46c
      Paolo Abeni 提交于
      If a MP_JOIN subflow completes the 3whs while another
      CPU is closing the master msk, we can hit the
      following race:
      
      CPU1                                    CPU2
      
      close()
       mptcp_close
                                              subflow_syn_recv_sock
                                               mptcp_token_get_sock
                                               mptcp_finish_join
                                                inet_sk_state_load
        mptcp_token_destroy
        inet_sk_state_store(TCP_CLOSE)
        __mptcp_flush_join_list()
                                                mptcp_sock_graft
                                                list_add_tail
        sk_common_release
         sock_orphan()
       <socket free>
      
      The MP_JOIN socket will be leaked. Additionally we can hit
      UaF for the msk 'struct socket' referenced via the 'conn'
      field.
      
      This change try to address the issue introducing some
      synchronization between the MP_JOIN 3whs and mptcp_close
      via the join_list spinlock. If we detect the msk is closing
      the MP_JOIN socket is closed, too.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10f6d46c
    • P
      mptcp: fix unblocking connect() · 41be81a8
      Paolo Abeni 提交于
      Currently unblocking connect() on MPTCP sockets fails frequently.
      If mptcp_stream_connect() is invoked to complete a previously
      attempted unblocking connection, it will still try to create
      the first subflow via __mptcp_socket_create(). If the 3whs is
      completed and the 'can_ack' flag is already set, the latter
      will fail with -EINVAL.
      
      This change addresses the issue checking for pending connect and
      delegating the completion to the first subflow. Additionally
      do msk addresses and sk_state changes only when needed.
      
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41be81a8
    • W
      net/sched: act_ct: add nat mangle action only for NAT-conntrack · 05aa69e5
      wenxu 提交于
      Currently add nat mangle action with comparing invert and orig tuple.
      It is better to check IPS_NAT_MASK flags first to avoid non necessary
      memcmp for non-NAT conntrack.
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05aa69e5
    • Y
      devinet: fix memleak in inetdev_init() · 1b49cd71
      Yang Yingliang 提交于
      When devinet_sysctl_register() failed, the memory allocated
      in neigh_parms_alloc() should be freed.
      
      Fixes: 20e61da7 ("ipv4: fail early when creating netdev named all or default")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b49cd71
    • J
      virtio_vsock: Fix race condition in virtio_transport_recv_pkt · 8692cefc
      Jia He 提交于
      When client on the host tries to connect(SOCK_STREAM, O_NONBLOCK) to the
      server on the guest, there will be a panic on a ThunderX2 (armv8a server):
      
      [  463.718844] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  463.718848] Mem abort info:
      [  463.718849]   ESR = 0x96000044
      [  463.718852]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  463.718853]   SET = 0, FnV = 0
      [  463.718854]   EA = 0, S1PTW = 0
      [  463.718855] Data abort info:
      [  463.718856]   ISV = 0, ISS = 0x00000044
      [  463.718857]   CM = 0, WnR = 1
      [  463.718859] user pgtable: 4k pages, 48-bit VAs, pgdp=0000008f6f6e9000
      [  463.718861] [0000000000000000] pgd=0000000000000000
      [  463.718866] Internal error: Oops: 96000044 [#1] SMP
      [...]
      [  463.718977] CPU: 213 PID: 5040 Comm: vhost-5032 Tainted: G           O      5.7.0-rc7+ #139
      [  463.718980] Hardware name: GIGABYTE R281-T91-00/MT91-FS1-00, BIOS F06 09/25/2018
      [  463.718982] pstate: 60400009 (nZCv daif +PAN -UAO)
      [  463.718995] pc : virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.718999] lr : virtio_transport_recv_pkt+0x1fc/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.719000] sp : ffff80002dbe3c40
      [...]
      [  463.719025] Call trace:
      [  463.719030]  virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.719034]  vhost_vsock_handle_tx_kick+0x360/0x408 [vhost_vsock]
      [  463.719041]  vhost_worker+0x100/0x1a0 [vhost]
      [  463.719048]  kthread+0x128/0x130
      [  463.719052]  ret_from_fork+0x10/0x18
      
      The race condition is as follows:
      Task1                                Task2
      =====                                =====
      __sock_release                       virtio_transport_recv_pkt
        __vsock_release                      vsock_find_bound_socket (found sk)
          lock_sock_nested
          vsock_remove_sock
          sock_orphan
            sk_set_socket(sk, NULL)
          sk->sk_shutdown = SHUTDOWN_MASK
          ...
          release_sock
                                          lock_sock
                                             virtio_transport_recv_connecting
                                               sk->sk_socket->state (panic!)
      
      The root cause is that vsock_find_bound_socket can't hold the lock_sock,
      so there is a small race window between vsock_find_bound_socket() and
      lock_sock(). If __vsock_release() is running in another task,
      sk->sk_socket will be set to NULL inadvertently.
      
      This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
      lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
      protection of lock_sock_nested.
      Signed-off-by: NJia He <justin.he@arm.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8692cefc
  2. 30 5月, 2020 1 次提交
  3. 29 5月, 2020 2 次提交
    • X
      xfrm: fix a NULL-ptr deref in xfrm_local_error · f6a23d85
      Xin Long 提交于
      This patch is to fix a crash:
      
        [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [ ] general protection fault: 0000 [#1] SMP KASAN PTI
        [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
        [ ] Call Trace:
        [ ]  xfrm6_local_error+0x1eb/0x300
        [ ]  xfrm_local_error+0x95/0x130
        [ ]  __xfrm6_output+0x65f/0xb50
        [ ]  xfrm6_output+0x106/0x46f
        [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
        [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
        [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
        [ ]  dev_hard_start_xmit+0x165/0x820
        [ ]  __dev_queue_xmit+0x1ff0/0x2b90
        [ ]  ip_finish_output2+0xd3e/0x1480
        [ ]  ip_do_fragment+0x182d/0x2210
        [ ]  ip_output+0x1d0/0x510
        [ ]  ip_send_skb+0x37/0xa0
        [ ]  raw_sendmsg+0x1b4c/0x2b80
        [ ]  sock_sendmsg+0xc0/0x110
      
      This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
      skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
      xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
      to get ipv6 info from a ipv4 sk.
      
      This issue was actually fixed by Commit 628e341f ("xfrm: make local
      error reporting more robust"), but brought back by Commit 844d4874
      ("xfrm: choose protocol family by skb protocol").
      
      So to fix it, we should call xfrm6_local_error() only when skb->protocol
      is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
      
      Fixes: 844d4874 ("xfrm: choose protocol family by skb protocol")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f6a23d85
    • J
      sctp: check assoc before SCTP_ADDR_{MADE_PRIM, ADDED} event · 45ebf73e
      Jonas Falkevik 提交于
      Make sure SCTP_ADDR_{MADE_PRIM,ADDED} are sent only for associations
      that have been established.
      
      These events are described in rfc6458#section-6.1
      SCTP_PEER_ADDR_CHANGE:
      This tag indicates that an address that is
      part of an existing association has experienced a change of
      state (e.g., a failure or return to service of the reachability
      of an endpoint via a specific transport address).
      Signed-off-by: NJonas Falkevik <jonas.falkevik@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45ebf73e
  4. 28 5月, 2020 4 次提交
    • V
      net: dsa: declare lockless TX feature for slave ports · 2b86cb82
      Vladimir Oltean 提交于
      Be there a platform with the following layout:
      
            Regular NIC
             |
             +----> DSA master for switch port
                     |
                     +----> DSA master for another switch port
      
      After changing DSA back to static lockdep class keys in commit
      1a33e10e ("net: partially revert dynamic lockdep key changes"), this
      kernel splat can be seen:
      
      [   13.361198] ============================================
      [   13.366524] WARNING: possible recursive locking detected
      [   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
      [   13.377874] --------------------------------------------
      [   13.383201] swapper/0/0 is trying to acquire lock:
      [   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.397879]
      [   13.397879] but task is already holding lock:
      [   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.413593]
      [   13.413593] other info that might help us debug this:
      [   13.420140]  Possible unsafe locking scenario:
      [   13.420140]
      [   13.426075]        CPU0
      [   13.428523]        ----
      [   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.440924]
      [   13.440924]  *** DEADLOCK ***
      [   13.440924]
      [   13.446860]  May be due to missing lock nesting notation
      [   13.446860]
      [   13.453668] 6 locks held by swapper/0/0:
      [   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
      [   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
      [   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
      [   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.512000]
      [   13.512000] stack backtrace:
      [   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
      [   13.530421] Call trace:
      [   13.532871]  dump_backtrace+0x0/0x1d8
      [   13.536539]  show_stack+0x24/0x30
      [   13.539862]  dump_stack+0xe8/0x150
      [   13.543271]  __lock_acquire+0x1030/0x1678
      [   13.547290]  lock_acquire+0xf8/0x458
      [   13.550873]  _raw_spin_lock+0x44/0x58
      [   13.554543]  __dev_queue_xmit+0x84c/0xbe0
      [   13.558562]  dev_queue_xmit+0x24/0x30
      [   13.562232]  dsa_slave_xmit+0xe0/0x128
      [   13.565988]  dev_hard_start_xmit+0xf4/0x448
      [   13.570182]  __dev_queue_xmit+0x808/0xbe0
      [   13.574200]  dev_queue_xmit+0x24/0x30
      [   13.577869]  neigh_resolve_output+0x15c/0x220
      [   13.582237]  ip6_finish_output2+0x244/0xb10
      [   13.586430]  __ip6_finish_output+0x1dc/0x298
      [   13.590709]  ip6_output+0x84/0x358
      [   13.594116]  mld_sendpack+0x2bc/0x560
      [   13.597786]  mld_ifc_timer_expire+0x210/0x390
      [   13.602153]  call_timer_fn+0xcc/0x400
      [   13.605822]  run_timer_softirq+0x588/0x6e0
      [   13.609927]  __do_softirq+0x118/0x590
      [   13.613597]  irq_exit+0x13c/0x148
      [   13.616918]  __handle_domain_irq+0x6c/0xc0
      [   13.621023]  gic_handle_irq+0x6c/0x160
      [   13.624779]  el1_irq+0xbc/0x180
      [   13.627927]  cpuidle_enter_state+0xb4/0x4d0
      [   13.632120]  cpuidle_enter+0x3c/0x50
      [   13.635703]  call_cpuidle+0x44/0x78
      [   13.639199]  do_idle+0x228/0x2c8
      [   13.642433]  cpu_startup_entry+0x2c/0x48
      [   13.646363]  rest_init+0x1ac/0x280
      [   13.649773]  arch_call_rest_init+0x14/0x1c
      [   13.653878]  start_kernel+0x490/0x4bc
      
      Lockdep keys themselves were added in commit ab92d68f ("net: core:
      add generic lockdep keys"), and it's very likely that this splat existed
      since then, but I have no real way to check, since this stacked platform
      wasn't supported by mainline back then.
      
      >From Taehee's own words:
      
        This patch was considered that all stackable devices have LLTX flag.
        But the dsa doesn't have LLTX, so this splat happened.
        After this patch, dsa shares the same lockdep class key.
        On the nested dsa interface architecture, which you illustrated,
        the same lockdep class key will be used in __dev_queue_xmit() because
        dsa doesn't have LLTX.
        So that lockdep detects deadlock because the same lockdep class key is
        used recursively although actually the different locks are used.
        There are some ways to fix this problem.
      
        1. using NETIF_F_LLTX flag.
        If possible, using the LLTX flag is a very clear way for it.
        But I'm so sorry I don't know whether the dsa could have LLTX or not.
      
        2. using dynamic lockdep again.
        It means that each interface uses a separate lockdep class key.
        So, lockdep will not detect recursive locking.
        But this way has a problem that it could consume lockdep class key
        too many.
        Currently, lockdep can have 8192 lockdep class keys.
         - you can see this number with the following command.
           cat /proc/lockdep_stats
           lock-classes:                         1251 [max: 8192]
           ...
           The [max: 8192] means that the maximum number of lockdep class keys.
        If too many lockdep class keys are registered, lockdep stops to work.
        So, using a dynamic(separated) lockdep class key should be considered
        carefully.
        In addition, updating lockdep class key routine might have to be existing.
        (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
      
        3. Using lockdep subclass.
        A lockdep class key could have 8 subclasses.
        The different subclass is considered different locks by lockdep
        infrastructure.
        But "lock-classes" is not counted by subclasses.
        So, it could avoid stopping lockdep infrastructure by an overflow of
        lockdep class keys.
        This approach should also have an updating lockdep class key routine.
        (lockdep_set_subclass())
      
        4. Using nonvalidate lockdep class key.
        The lockdep infrastructure supports nonvalidate lockdep class key type.
        It means this lockdep is not validated by lockdep infrastructure.
        So, the splat will not happen but lockdep couldn't detect real deadlock
        case because lockdep really doesn't validate it.
        I think this should be used for really special cases.
        (lockdep_set_novalidate_class())
      
      Further discussion here:
      https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
      
      There appears to be no negative side-effect to declaring lockless TX for
      the DSA virtual interfaces, which means they handle their own locking.
      So that's what we do to make the splat go away.
      
      Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Suggested-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b86cb82
    • A
      bridge: multicast: work around clang bug · b3b6a84c
      Arnd Bergmann 提交于
      Clang-10 and clang-11 run into a corner case of the register
      allocator on 32-bit ARM, leading to excessive stack usage from
      register spilling:
      
      net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=]
      
      Work around this by marking one of the internal functions as
      noinline_for_stack.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3b6a84c
    • S
      vsock: fix timeout in vsock_accept() · 7e0afbdf
      Stefano Garzarella 提交于
      The accept(2) is an "input" socket interface, so we should use
      SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.
      
      So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
      use the right timeout in the vsock_accept().
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e0afbdf
    • D
      net/sched: fix infinite loop in sch_fq_pie · bb2f930d
      Davide Caratti 提交于
      this command hangs forever:
      
       # tc qdisc add dev eth0 root fq_pie flows 65536
      
       watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028]
       [...]
       CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167
       RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie]
       Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48
       RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
       RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff
       RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590
       RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699
       R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0
       R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8
       FS:  00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0
       Call Trace:
        qdisc_create+0x3fd/0xeb0
        tc_modify_qdisc+0x3be/0x14a0
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x121/0x350
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      we can't accept 65536 as a valid number for 'nflows', because the loop on
      'idx' in fq_pie_init() will never end. The extack message is correct, but
      it doesn't say that 0 is not a valid number for 'flows': while at it, fix
      this also. Add a tdc selftest to check correct validation of 'flows'.
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb2f930d
  5. 27 5月, 2020 9 次提交
  6. 26 5月, 2020 7 次提交
  7. 25 5月, 2020 3 次提交
    • X
      xfrm: fix a warning in xfrm_policy_insert_list · ed17b8d3
      Xin Long 提交于
      This waring can be triggered simply by:
      
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 1 mark 0 mask 0x10  #[1]
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 2 mark 0 mask 0x1   #[2]
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 2 mark 0 mask 0x10  #[3]
      
      Then dmesg shows:
      
        [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548
        [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030
        [ ] Call Trace:
        [ ]  xfrm_policy_inexact_insert+0x85/0xe50
        [ ]  xfrm_policy_insert+0x4ba/0x680
        [ ]  xfrm_add_policy+0x246/0x4d0
        [ ]  xfrm_user_rcv_msg+0x331/0x5c0
        [ ]  netlink_rcv_skb+0x121/0x350
        [ ]  xfrm_netlink_rcv+0x66/0x80
        [ ]  netlink_unicast+0x439/0x630
        [ ]  netlink_sendmsg+0x714/0xbf0
        [ ]  sock_sendmsg+0xe2/0x110
      
      The issue was introduced by Commit 7cb8a939 ("xfrm: Allow inserting
      policies with matching mark and different priorities"). After that, the
      policies [1] and [2] would be able to be added with different priorities.
      
      However, policy [3] will actually match both [1] and [2]. Policy [1]
      was matched due to the 1st 'return true' in xfrm_policy_mark_match(),
      and policy [2] was matched due to the 2nd 'return true' in there. It
      caused WARN_ON() in xfrm_policy_insert_list().
      
      This patch is to fix it by only (the same value and priority) as the
      same policy in xfrm_policy_mark_match().
      
      Thanks to Yuehaibing, we could make this fix better.
      
      v1->v2:
        - check policy->mark.v == pol->mark.v only without mask.
      
      Fixes: 7cb8a939 ("xfrm: Allow inserting policies with matching mark and different priorities")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ed17b8d3
    • J
      cfg80211: fix debugfs rename crash · 0bbab5f0
      Johannes Berg 提交于
      Removing the "if (IS_ERR(dir)) dir = NULL;" check only works
      if we adjust the remaining code to not rely on it being NULL.
      Check IS_ERR_OR_NULL() before attempting to dereference it.
      
      I'm not actually entirely sure this fixes the syzbot crash as
      the kernel config indicates that they do have DEBUG_FS in the
      kernel, but this is what I found when looking there.
      
      Cc: stable@vger.kernel.org
      Fixes: d82574a8 ("cfg80211: no need to check return value of debugfs_create functions")
      Reported-by: syzbot+fd5332e429401bf42d18@syzkaller.appspotmail.com
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20200525113816.fc4da3ec3d4b.Ica63a110679819eaa9fb3bc1b7437d96b1fd187d@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      0bbab5f0
    • L
      mac80211: mesh: fix discovery timer re-arming issue / crash · e2d4a80f
      Linus Lüssing 提交于
      On a non-forwarding 802.11s link between two fairly busy
      neighboring nodes (iperf with -P 16 at ~850MBit/s TCP;
      1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with
      frequent PREQ retries, usually after around 30-40 seconds the
      following crash would occur:
      
      [ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000
      [ 1110.830786] Mem abort info:
      [ 1110.833573]   Exception class = IABT (current EL), IL = 32 bits
      [ 1110.839494]   SET = 0, FnV = 0
      [ 1110.842546]   EA = 0, S1PTW = 0
      [ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000
      [ 1110.852204] [0000000000000000] *pgd=00000000f6322003, *pud=00000000f62de003, *pmd=0000000000000000
      [ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP
      [ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common
      [ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000)
      [ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0
      [ 1110.944965] Hardware name: LS1043A RGW Board (DT)
      [ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000
      [ 1110.955568] PC is at 0x0
      [ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78
      [ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145
      [ 1110.970440] sp : ffff00000801be10
      [ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018
      [ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0
      [ 1110.984352] x25: 0000000000000000 x24: 0000000000000000
      [ 1110.989657] x23: dead000000000200 x22: 0000000000000000
      [ 1110.994959] x21: 0000000000000000 x20: 0000000000000101
      [ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000
      [ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058
      [ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000
      [ 1111.016172] x13: ffff000008916788 x12: 0000000000000040
      [ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001
      [ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004
      [ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8
      [ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010
      [ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614
      [ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000
      [ 1111.053290] Call trace:
      [ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10)
      [ 1111.062158] bcc0:                                   0000000000000000 0000000000000000
      [ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0
      [ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0
      [ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788
      [ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0
      [ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000
      [ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000
      [ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10
      [ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145
      [ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8
      [ 1111.140353] be00: ffff00000801be10 0000000000000000
      [ 1111.145220] [<          (null)>]           (null)
      [ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398
      [ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc
      [ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8
      [ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0
      [ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0
      [ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70)
      [ 1111.183642] be20:                                   0000000000000020 0000000000000000
      [ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000
      [ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0
      [ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000
      [ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0
      [ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000
      [ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000
      [ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70
      [ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145
      [ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574
      [ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0
      [ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c
      [ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18
      [ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8
      [ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28
      [ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110
      [ 1111.293662] Code: bad PC value
      [ 1111.296710] ---[ end trace 555b6ca4363c3edd ]---
      [ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt
      [ 1111.307661] SMP: stopping secondary CPUs
      [ 1111.311574] Kernel Offset: disabled
      [ 1111.315053] CPU features: 0x0002000
      [ 1111.318530] Memory Limit: none
      [ 1111.321575] Rebooting in 3 seconds..
      
      With some added debug output / delays we were able to push the crash from
      the timer callback runner into the callback function and by that shedding
      some light on which object holding the timer gets corrupted:
      
      [  401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868
      [...]
      [  402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48
      [  402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211]
      [  402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78
      [  402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398
      [  402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc
      [  402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8
      [  402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0
      [  402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0
      
      The issue happens due to the following sequence of events:
      
      1) mesh_path_start_discovery():
      -> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx()
      
      2) mesh_path_free_rcu()
      -> del_timer_sync(&mpath->timer)
         [...]
      -> kfree_rcu(mpath)
      
      3) mesh_path_start_discovery():
      -> mod_timer(&mpath->timer, ...)
         [...]
      -> rcu_read_unlock()
      
      4) mesh_path_free_rcu()'s kfree_rcu():
      -> kfree(mpath)
      
      5) mesh_path_timer() starts after timeout, using freed mpath object
      
      So a use-after-free issue due to a timer re-arming bug caused by an
      early spin-unlocking.
      
      This patch fixes this issue by re-checking if mpath is about to be
      free'd and if so bails out of re-arming the timer.
      
      Cc: stable@vger.kernel.org
      Fixes: 050ac52c ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NLinus Lüssing <ll@simonwunderlich.de>
      Link: https://lore.kernel.org/r/20200522170413.14973-1-linus.luessing@c0d3.blueSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      e2d4a80f
  8. 23 5月, 2020 4 次提交
    • Q
      rxrpc: Fix a memory leak in rxkad_verify_response() · f45d01f4
      Qiushi Wu 提交于
      A ticket was not released after a call of the function
      "rxkad_decrypt_ticket" failed. Thus replace the jump target
      "temporary_error_free_resp" by "temporary_error_free_ticket".
      
      Fixes: 8c2f826d ("rxrpc: Don't put crypto buffers on the stack")
      Signed-off-by: NQiushi Wu <wu000273@umn.edu>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Markus Elfring <Markus.Elfring@web.de>
      f45d01f4
    • J
      sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and socket is closed · d3e8e4c1
      Jere Leppänen 提交于
      Commit bdf6fa52 ("sctp: handle association restarts when the
      socket is closed.") starts shutdown when an association is restarted,
      if in SHUTDOWN-PENDING state and the socket is closed. However, the
      rationale stated in that commit applies also when in SHUTDOWN-SENT
      state - we don't want to move an association to ESTABLISHED state when
      the socket has been closed, because that results in an association
      that is unreachable from user space.
      
      The problem scenario:
      
      1.  Client crashes and/or restarts.
      
      2.  Server (using one-to-one socket) calls close(). SHUTDOWN is lost.
      
      3.  Client reconnects using the same addresses and ports.
      
      4.  Server's association is restarted. The association and the socket
          move to ESTABLISHED state, even though the server process has
          closed its descriptor.
      
      Also, after step 4 when the server process exits, some resources are
      leaked in an attempt to release the underlying inet sock structure in
      ESTABLISHED state:
      
          IPv4: Attempt to release TCP socket in state 1 00000000377288c7
      
      Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if
      an association is restarted in SHUTDOWN-SENT state and the socket is
      closed, then start shutdown and don't move the association or the
      socket to ESTABLISHED state.
      
      Fixes: bdf6fa52 ("sctp: handle association restarts when the socket is closed.")
      Signed-off-by: NJere Leppänen <jere.leppanen@nokia.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e8e4c1
    • E
      tipc: block BH before using dst_cache · 13788174
      Eric Dumazet 提交于
      dst_cache_get() documents it must be used with BH disabled.
      
      sysbot reported :
      
      BUG: using smp_processor_id() in preemptible [00000000] code: /21697
      caller is dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
      CPU: 0 PID: 21697 Comm:  Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       check_preemption_disabled lib/smp_processor_id.c:47 [inline]
       debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
       dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
       tipc_udp_xmit.isra.0+0xb9/0xad0 net/tipc/udp_media.c:164
       tipc_udp_send_msg+0x3e6/0x490 net/tipc/udp_media.c:244
       tipc_bearer_xmit_skb+0x1de/0x3f0 net/tipc/bearer.c:526
       tipc_enable_bearer+0xb2f/0xd60 net/tipc/bearer.c:331
       __tipc_nl_bearer_enable+0x2bf/0x390 net/tipc/bearer.c:995
       tipc_nl_bearer_enable+0x1e/0x30 net/tipc/bearer.c:1003
       genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:718 [inline]
       genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735
       netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:746
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x45ca29
      
      Fixes: e9c1a793 ("tipc: add dst_cache support for udp media")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13788174
    • T
      mptcp: use untruncated hash in ADD_ADDR HMAC · bd697222
      Todd Malsbary 提交于
      There is some ambiguity in the RFC as to whether the ADD_ADDR HMAC is
      the rightmost 64 bits of the entire hash or of the leftmost 160 bits
      of the hash.  The intention, as clarified with the author of the RFC,
      is the entire hash.
      
      This change returns the entire hash from
      mptcp_crypto_hmac_sha (instead of only the first 160 bits), and moves
      any truncation/selection operation on the hash to the caller.
      
      Fixes: 12555a2d ("mptcp: use rightmost 64 bits in ADD_ADDR HMAC")
      Reviewed-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NTodd Malsbary <todd.malsbary@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd697222
  9. 22 5月, 2020 5 次提交
    • J
      flow_dissector: Drop BPF flow dissector prog ref on netns cleanup · 5cf65922
      Jakub Sitnicki 提交于
      When attaching a flow dissector program to a network namespace with
      bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
      
      If netns gets destroyed while a flow dissector is still attached, and there
      are no other references to the prog, we leak the reference and the program
      remains loaded.
      
      Leak can be reproduced by running flow dissector tests from selftests/bpf:
      
        # bpftool prog list
        # ./test_flow_dissector.sh
        ...
        selftests: test_flow_dissector [PASS]
        # bpftool prog list
        4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
                loaded_at 2020-05-20T18:50:53+0200  uid 0
                xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
                btf_id 4
        #
      
      Fix it by detaching the flow dissector program when netns is going away.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com
      5cf65922
    • S
      net: don't return invalid table id error when we fall back to PF_UNSPEC · 41b4bd98
      Sabrina Dubroca 提交于
      In case we can't find a ->dumpit callback for the requested
      (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
      in the same situation as if userspace had requested a PF_UNSPEC
      dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
      the registered RTM_GETROUTE handlers.
      
      The requested table id may or may not exist for all of those
      families. commit ae677bbb ("net: Don't return invalid table id
      error when dumping all families") fixed the problem when userspace
      explicitly requests a PF_UNSPEC dump, but missed the fallback case.
      
      For example, when we pass ipv6.disable=1 to a kernel with
      CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
      the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
      rtnl_dump_all, and listing IPv6 routes will unexpectedly print:
      
        # ip -6 r
        Error: ipv4: MR table does not exist.
        Dump terminated
      
      commit ae677bbb introduced the dump_all_families variable, which
      gets set when userspace requests a PF_UNSPEC dump. However, we can't
      simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
      fallback case to get dump_all_families == true, because some messages
      types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
      PF_UNSPEC handler and use the family to filter in the kernel what is
      dumped to userspace. We would then export more entries, that userspace
      would have to filter. iproute does that, but other programs may not.
      
      Instead, this patch removes dump_all_families and updates the
      RTM_GETROUTE handlers to check if the family that is being dumped is
      their own. When it's not, which covers both the intentional PF_UNSPEC
      dumps (as dump_all_families did) and the fallback case, ignore the
      missing table id error.
      
      Fixes: cb167893 ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41b4bd98
    • V
      net: ipip: fix wrong address family in init error path · 57ebc8f0
      Vadim Fedorenko 提交于
      In case of error with MPLS support the code is misusing AF_INET
      instead of AF_MPLS.
      
      Fixes: 1b69e7e6 ("ipip: support MPLS over IPv4")
      Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57ebc8f0
    • V
      net/tls: free record only on encryption error · 635d9398
      Vadim Fedorenko 提交于
      We cannot free record on any transient error because it leads to
      losing previos data. Check socket error to know whether record must
      be freed or not.
      
      Fixes: d10523d0 ("net/tls: free the record on encryption error")
      Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      635d9398
    • V
      net/tls: fix encryption error checking · a7bff11f
      Vadim Fedorenko 提交于
      bpf_exec_tx_verdict() can return negative value for copied
      variable. In that case this value will be pushed back to caller
      and the real error code will be lost. Fix it using signed type and
      checking for positive value.
      
      Fixes: d10523d0 ("net/tls: free the record on encryption error")
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7bff11f