1. 25 4月, 2017 2 次提交
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
    • D
      net: add rcu locking when changing early demux · 58c4c6a3
      David Ahern 提交于
      systemd-sysctl is triggering a suspicious RCU usage message when
      net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
      a sysctl config file:
      
      [   33.896184] ===============================
      [   33.899558] [ ERR: suspicious RCU usage.  ]
      [   33.900624] 4.11.0-rc7+ #104 Not tainted
      [   33.901698] -------------------------------
      [   33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
      [   33.905724]
      other info that might help us debug this:
      
      [   33.907656]
      rcu_scheduler_active = 2, debug_locks = 0
      [   33.909288] 1 lock held by systemd-sysctl/143:
      [   33.910373]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48
      [   33.912407]
      stack backtrace:
      [   33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
      [   33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   33.917870] Call Trace:
      [   33.918431]  dump_stack+0x81/0xb6
      [   33.919241]  lockdep_rcu_suspicious+0x10f/0x118
      [   33.920263]  proc_configure_early_demux+0x65/0x10a
      [   33.921391]  proc_udp_early_demux+0x3a/0x41
      
      add rcu locking to proc_configure_early_demux.
      
      Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c4c6a3
  2. 22 4月, 2017 1 次提交
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
  3. 21 4月, 2017 3 次提交
  4. 19 4月, 2017 1 次提交
    • I
      esp4/6: Fix GSO path for non-GSO SW-crypto packets · 8f92e03e
      Ilan Tayari 提交于
      If esp*_offload module is loaded, outbound packets take the
      GSO code path, being encapsulated at layer 3, but encrypted
      in layer 2. validate_xmit_xfrm calls esp*_xmit for that.
      
      esp*_xmit was wrongfully detecting these packets as going
      through hardware crypto offload, while in fact they should
      be encrypted in software, causing plaintext leakage to
      the network, and also dropping at the receiver side.
      
      Perform the encryption in esp*_xmit, if the SA doesn't have
      a hardware offload_handle.
      
      Also, align esp6 code to esp4 logic.
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Fixes: 383d0350 ("esp6: Reorganize esp_output")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      8f92e03e
  5. 18 4月, 2017 3 次提交
  6. 14 4月, 2017 10 次提交
  7. 10 4月, 2017 1 次提交
    • E
      tcp: clear saved_syn in tcp_disconnect() · 17c3060b
      Eric Dumazet 提交于
      In the (very unlikely) case a passive socket becomes a listener,
      we do not want to duplicate its saved SYN headers.
      
      This would lead to double frees, use after free, and please hackers and
      various fuzzers
      
      Tested:
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
      
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 5) = 0
      
         +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <...>
        +.1 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 connect(4, AF_UNSPEC, ...) = 0
         +0 close(3) = 0
         +0 bind(4, ..., ...) = 0
         +0 listen(4, 5) = 0
      
         +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <...>
        +.1 < . 1:1(0) ack 1 win 257
      
      Fixes: cd8ae852 ("tcp: provide SYN headers for passive connections")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c3060b
  8. 08 4月, 2017 2 次提交
  9. 07 4月, 2017 2 次提交
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · bbadb9a2
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, fib_multipath_hash ends up being called with
      that skb. However, as that skb contains no information beyond the
      protocol type, the calculated hash does not match the one we would see
      for a real packet.
      
      There is currently no way to fix this for layer 4 hashing, as
      RTM_GETROUTE doesn't have the necessary information to create layer 4
      headers. To fix this for layer 3 hashing, set appropriate saddr/daddrs
      in the skb and also change the protocol to UDP to avoid special
      treatment for ICMP.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbadb9a2
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · a8801799
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, ip_multipath_icmp_hash ends up being called,
      which uses the source/destination addresses within the skb to calculate
      a hash. However, those are not set in the synthetic skb, causing it to
      return an arbitrary and incorrect result.
      
      Instead, use UDP, which gets no such special treatment.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8801799
  10. 06 4月, 2017 2 次提交
  11. 05 4月, 2017 1 次提交
  12. 04 4月, 2017 1 次提交
    • M
      tcp: minimize false-positives on TCP/GRO check · 0b9aefea
      Marcelo Ricardo Leitner 提交于
      Markus Trippelsdorf reported that after commit dcb17d22 ("tcp: warn
      on bogus MSS and try to amend it") the kernel started logging the
      warning for a NIC driver that doesn't even support GRO.
      
      It was diagnosed that it was possibly caused on connections that were
      using TCP Timestamps but some packets lacked the Timestamps option. As
      we reduce rcv_mss when timestamps are used, the lack of them would cause
      the packets to be bigger than expected, although this is a valid case.
      
      As this warning is more as a hint, getting a clean-cut on the
      threshold is probably not worth the execution time spent on it. This
      patch thus alleviates the false-positives with 2 quick checks: by
      accounting for the entire TCP option space and also checking against the
      interface MTU if it's available.
      
      These changes, specially the MTU one, might mask some real positives,
      though if they are really happening, it's possible that sooner or later
      it will be triggered anyway.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b9aefea
  13. 31 3月, 2017 1 次提交
  14. 29 3月, 2017 3 次提交
  15. 28 3月, 2017 1 次提交
    • M
      net: ipconfig: fix ic_close_devs() use-after-free · ffefb6f4
      Mark Rutland 提交于
      Our chosen ic_dev may be anywhere in our list of ic_devs, and we may
      free it before attempting to close others. When we compare d->dev and
      ic_dev->dev, we're potentially dereferencing memory returned to the
      allocator. This causes KASAN to scream for each subsequent ic_dev we
      check.
      
      As there's a 1-1 mapping between ic_devs and netdevs, we can instead
      compare d and ic_dev directly, which implicitly handles the !ic_dev
      case, and avoids the use-after-free. The ic_dev pointer may be stale,
      but we will not dereference it.
      
      Original splat:
      
      [    6.487446] ==================================================================
      [    6.494693] BUG: KASAN: use-after-free in ic_close_devs+0xc4/0x154 at addr ffff800367efa708
      [    6.503013] Read of size 8 by task swapper/0/1
      [    6.507452] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc3-00002-gda42158 #8
      [    6.514993] Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27 2016
      [    6.523138] Call trace:
      [    6.525590] [<ffff200008094778>] dump_backtrace+0x0/0x570
      [    6.530976] [<ffff200008094d08>] show_stack+0x20/0x30
      [    6.536017] [<ffff200008bee928>] dump_stack+0x120/0x188
      [    6.541231] [<ffff20000856d5e4>] kasan_object_err+0x24/0xa0
      [    6.546790] [<ffff20000856d924>] kasan_report_error+0x244/0x738
      [    6.552695] [<ffff20000856dfec>] __asan_report_load8_noabort+0x54/0x80
      [    6.559204] [<ffff20000aae86ac>] ic_close_devs+0xc4/0x154
      [    6.564590] [<ffff20000aaedbac>] ip_auto_config+0x2ed4/0x2f1c
      [    6.570321] [<ffff200008084b04>] do_one_initcall+0xcc/0x370
      [    6.575882] [<ffff20000aa31de8>] kernel_init_freeable+0x5f8/0x6c4
      [    6.581959] [<ffff20000a16df00>] kernel_init+0x18/0x190
      [    6.587171] [<ffff200008084710>] ret_from_fork+0x10/0x40
      [    6.592468] Object at ffff800367efa700, in cache kmalloc-128 size: 128
      [    6.598969] Allocated:
      [    6.601324] PID = 1
      [    6.603427]  save_stack_trace_tsk+0x0/0x418
      [    6.607603]  save_stack_trace+0x20/0x30
      [    6.611430]  kasan_kmalloc+0xd8/0x188
      [    6.615087]  ip_auto_config+0x8c4/0x2f1c
      [    6.619002]  do_one_initcall+0xcc/0x370
      [    6.622832]  kernel_init_freeable+0x5f8/0x6c4
      [    6.627178]  kernel_init+0x18/0x190
      [    6.630660]  ret_from_fork+0x10/0x40
      [    6.634223] Freed:
      [    6.636233] PID = 1
      [    6.638334]  save_stack_trace_tsk+0x0/0x418
      [    6.642510]  save_stack_trace+0x20/0x30
      [    6.646337]  kasan_slab_free+0x88/0x178
      [    6.650167]  kfree+0xb8/0x478
      [    6.653131]  ic_close_devs+0x130/0x154
      [    6.656875]  ip_auto_config+0x2ed4/0x2f1c
      [    6.660875]  do_one_initcall+0xcc/0x370
      [    6.664705]  kernel_init_freeable+0x5f8/0x6c4
      [    6.669051]  kernel_init+0x18/0x190
      [    6.672534]  ret_from_fork+0x10/0x40
      [    6.676098] Memory state around the buggy address:
      [    6.680880]  ffff800367efa600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [    6.688078]  ffff800367efa680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [    6.695276] >ffff800367efa700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [    6.702469]                       ^
      [    6.705952]  ffff800367efa780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [    6.713149]  ffff800367efa800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [    6.720343] ==================================================================
      [    6.727536] Disabling lock debugging due to kernel taint
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffefb6f4
  16. 27 3月, 2017 2 次提交
    • G
      netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register · 75c689dc
      Gao Feng 提交于
      In the commit 93557f53 ("netfilter: nf_conntrack: nf_conntrack snmp
      helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
      snmp_helper is never registered. But it still tries to unregister the
      snmp_helper, it could cause the panic.
      
      Now remove the useless snmp_helper and the unregister call in the
      error handler.
      
      Fixes: 93557f53 ("netfilter: nf_conntrack: nf_conntrack snmp helper")
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      75c689dc
    • L
      netfilter: invoke synchronize_rcu after set the _hook_ to NULL · 3b7dabf0
      Liping Zhang 提交于
      Otherwise, another CPU may access the invalid pointer. For example:
          CPU0                CPU1
           -              rcu_read_lock();
           -              pfunc = _hook_;
        _hook_ = NULL;          -
        mod unload              -
           -                 pfunc(); // invalid, panic
           -             rcu_read_unlock();
      
      So we must call synchronize_rcu() to wait the rcu reader to finish.
      
      Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
      by later nf_conntrack_helper_unregister, but I'm inclined to add a
      explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
      on such obscure assumptions is not a good idea.
      
      Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
      so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
      remove it too.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3b7dabf0
  17. 25 3月, 2017 3 次提交
  18. 23 3月, 2017 1 次提交
    • E
      inet: frag: release spinlock before calling icmp_send() · ec4fbd64
      Eric Dumazet 提交于
      Dmitry reported a lockdep splat [1] (false positive) that we can fix
      by releasing the spinlock before calling icmp_send() from ip_expire()
      
      This is a false positive because sending an ICMP message can not
      possibly re-enter the IP frag engine.
      
      [1]
      [ INFO: possible circular locking dependency detected ]
      4.10.0+ #29 Not tainted
      -------------------------------------------------------
      modprobe/12392 is trying to acquire lock:
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
      include/linux/netdevice.h:3486 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
      sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
      
      but task is already holding lock:
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&(&q->lock)->rlock){+.-...}:
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
             ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
             packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
             deliver_skb net/core/dev.c:1834 [inline]
             dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
             xmit_one net/core/dev.c:2903 [inline]
             dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
             sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
             neigh_output include/net/neighbour.h:478 [inline]
             ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
             ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
             ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
             ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
             inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
             sock_sendmsg_nosec net/socket.c:633 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:643
             ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
             __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
             SYSC_sendmmsg net/socket.c:2106 [inline]
             SyS_sendmmsg+0x35/0x60 net/socket.c:2101
             do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
             return_from_SYSCALL_64+0x0/0x7a
      
      -> #0 (_xmit_ETHER#2){+.-...}:
             check_prev_add kernel/locking/lockdep.c:1830 [inline]
             check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             __netif_tx_lock include/linux/netdevice.h:3486 [inline]
             sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_hh_output include/net/neighbour.h:468 [inline]
             neigh_output include/net/neighbour.h:476 [inline]
             ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
             ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
             icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
             ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
             call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
             expire_timers kernel/time/timer.c:1307 [inline]
             __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
             run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
             __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
             invoke_softirq kernel/softirq.c:364 [inline]
             irq_exit+0x1cc/0x200 kernel/softirq.c:405
             exiting_irq arch/x86/include/asm/apic.h:657 [inline]
             smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
             apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
             __read_once_size include/linux/compiler.h:254 [inline]
             atomic_read arch/x86/include/asm/atomic.h:26 [inline]
             rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
             __rcu_is_watching kernel/rcu/tree.c:1133 [inline]
             rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
             rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
             radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
             filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
             do_fault_around mm/memory.c:3231 [inline]
             do_read_fault mm/memory.c:3265 [inline]
             do_fault+0xbd5/0x2080 mm/memory.c:3370
             handle_pte_fault mm/memory.c:3600 [inline]
             __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
             handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
             __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
             do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
             page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&(&q->lock)->rlock);
                                     lock(_xmit_ETHER#2);
                                     lock(&(&q->lock)->rlock);
        lock(_xmit_ETHER#2);
      
       *** DEADLOCK ***
      
      10 locks held by modprobe/12392:
       #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
      __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
       #1:  (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
      filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      spin_lock include/linux/spinlock.h:299 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      pte_alloc_one_map mm/memory.c:2944 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      lockdep_copy_map include/linux/lockdep.h:175 [inline]
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
       #5:  (rcu_read_lock){......}, at: [<ffffffff8389a633>]
      ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
      include/linux/spinlock.h:309 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
      net/ipv4/icmp.c:219 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
      icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
       #7:  (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
      ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
       #8:  (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
      __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
       #9:  (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
      [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
      
      stack backtrace:
      CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
       print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
       check_prev_add kernel/locking/lockdep.c:1830 [inline]
       check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       __netif_tx_lock include/linux/netdevice.h:3486 [inline]
       sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_hh_output include/net/neighbour.h:468 [inline]
       neigh_output include/net/neighbour.h:476 [inline]
       ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
       ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
       icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
       ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
       call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:657 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
      RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
      RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
      RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
      RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
      RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
      RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
      R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
      R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
       </IRQ>
       rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
       radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
       filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
       do_fault_around mm/memory.c:3231 [inline]
       do_read_fault mm/memory.c:3265 [inline]
       do_fault+0xbd5/0x2080 mm/memory.c:3370
       handle_pte_fault mm/memory.c:3600 [inline]
       __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
       handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
       __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
       do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
       page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      RIP: 0033:0x7f83172f2786
      RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
      RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
      RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
      R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
      R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec4fbd64