• H
    net/tunnel: wait until all sk_user_data reader finish before releasing the sock · 3cf7203c
    Hangbin Liu 提交于
    There is a race condition in vxlan that when deleting a vxlan device
    during receiving packets, there is a possibility that the sock is
    released after getting vxlan_sock vs from sk_user_data. Then in
    later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
    NULL pointer dereference. e.g.
    
       #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
       #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
       #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
       #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
       #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
       #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
       #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
          [exception RIP: vxlan_ecn_decapsulate+0x3b]
          RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
          RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
          RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
          RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
          R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
          R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
       #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
       #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
      #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
      #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
      #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
      #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
      #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
      #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
      #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
      #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
      #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3
    
    Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh
    
    Fix this by waiting for all sk_user_data reader to finish before
    releasing the sock.
    Reported-by: NJianlin Shi <jishi@redhat.com>
    Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
    Fixes: 6a93cc90 ("udp-tunnel: Add a few more UDP tunnel APIs")
    Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: NJiri Pirko <jiri@nvidia.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    3cf7203c
udp_tunnel_core.c 5.1 KB