1. 14 3月, 2017 3 次提交
    • J
      dccp/tcp: fix routing redirect race · 45caeaa5
      Jon Maxwell 提交于
      As Eric Dumazet pointed out this also needs to be fixed in IPv6.
      v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
      
      We have seen a few incidents lately where a dst_enty has been freed
      with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
      dst_entry. If the conditions/timings are right a crash then ensues when the
      freed dst_entry is referenced later on. A Common crashing back trace is:
      
       #8 [] page_fault at ffffffff8163e648
          [exception RIP: __tcp_ack_snd_check+74]
      .
      .
       #9 [] tcp_rcv_established at ffffffff81580b64
      #10 [] tcp_v4_do_rcv at ffffffff8158b54a
      #11 [] tcp_v4_rcv at ffffffff8158cd02
      #12 [] ip_local_deliver_finish at ffffffff815668f4
      #13 [] ip_local_deliver at ffffffff81566bd9
      #14 [] ip_rcv_finish at ffffffff8156656d
      #15 [] ip_rcv at ffffffff81566f06
      #16 [] __netif_receive_skb_core at ffffffff8152b3a2
      #17 [] __netif_receive_skb at ffffffff8152b608
      #18 [] netif_receive_skb at ffffffff8152b690
      #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
      #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
      #21 [] net_rx_action at ffffffff8152bac2
      #22 [] __do_softirq at ffffffff81084b4f
      #23 [] call_softirq at ffffffff8164845c
      #24 [] do_softirq at ffffffff81016fc5
      #25 [] irq_exit at ffffffff81084ee5
      #26 [] do_IRQ at ffffffff81648ff8
      
      Of course it may happen with other NIC drivers as well.
      
      It's found the freed dst_entry here:
      
       224 static bool tcp_in_quickack_mode(struct sock *sk)
       225 {
       226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);
       227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);
       228 
       229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
       230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
       231 }
      
      But there are other backtraces attributed to the same freed dst_entry in
      netfilter code as well.
      
      All the vmcores showed 2 significant clues:
      
      - Remote hosts behind the default gateway had always been redirected to a
      different gateway. A rtable/dst_entry will be added for that host. Making
      more dst_entrys with lower reference counts. Making this more probable.
      
      - All vmcores showed a postitive LockDroppedIcmps value, e.g:
      
      LockDroppedIcmps                  267
      
      A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
      regardless of whether user space has the socket locked. This can result in a
      race condition where the same dst_entry cached in sk->sk_dst_entry can be
      decremented twice for the same socket via:
      
      do_redirect()->__sk_dst_check()-> dst_release().
      
      Which leads to the dst_entry being prematurely freed with another socket
      pointing to it via sk->sk_dst_cache and a subsequent crash.
      
      To fix this skip do_redirect() if usespace has the socket locked. Instead let
      the redirect take place later when user space does not have the socket
      locked.
      
      The dccp/IPv6 code is very similar in this respect, so fixing it there too.
      
      As Eric Garver pointed out the following commit now invalidates routes. Which
      can set the dst->obsolete flag so that ipv4_dst_check() returns null and
      triggers the dst_release().
      
      Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
      Cc: Eric Garver <egarver@redhat.com>
      Cc: Hannes Sowa <hsowa@redhat.com>
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45caeaa5
    • F
      ipv6: avoid write to a possibly cloned skb · 79e49503
      Florian Westphal 提交于
      ip6_fragment, in case skb has a fraglist, checks if the
      skb is cloned.  If it is, it will move to the 'slow path' and allocates
      new skbs for each fragment.
      
      However, right before entering the slowpath loop, it updates the
      nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
      to account for the fragment header that will be inserted in the new
      ipv6-fragment skbs.
      
      In case original skb is cloned this munges nexthdr value of another
      skb.  Avoid this by doing the nexthdr update for each of the new fragment
      skbs separately.
      
      This was observed with tcpdump on a bridge device where netfilter ipv6
      reassembly is active:  tcpdump shows malformed fragment headers as
      the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NAndreas Karis <akaris@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e49503
    • S
      ipv6: make ECMP route replacement less greedy · 67e19400
      Sabrina Dubroca 提交于
      Commit 27596472 ("ipv6: fix ECMP route replacement") introduced a
      loop that removes all siblings of an ECMP route that is being
      replaced. However, this loop doesn't stop when it has replaced
      siblings, and keeps removing other routes with a higher metric.
      We also end up triggering the WARN_ON after the loop, because after
      this nsiblings < 0.
      
      Instead, stop the loop when we have taken care of all routes with the
      same metric as the route being replaced.
      
        Reproducer:
        ===========
          #!/bin/sh
      
          ip netns add ns1
          ip netns add ns2
          ip -net ns1 link set lo up
      
          for x in 0 1 2 ; do
              ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
              ip -net ns1 link set eth$x up
              ip -net ns2 link set veth$x up
          done
      
          ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
                  nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
          ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
          ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
      
          echo "before replace, 3 routes"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
          echo
      
          ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
                  nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
      
          echo "after replace, only 2 routes, metric 2048 is gone"
          ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67e19400
  2. 10 3月, 2017 3 次提交
    • A
      udp: avoid ufo handling on IP payload compression packets · 4b3b45ed
      Alexey Kodanev 提交于
      commit c146066a ("ipv4: Don't use ufo handling on later transformed
      packets") and commit f89c56ce ("ipv6: Don't use ufo handling on
      later transformed packets") added a check that 'rt->dst.header_len' isn't
      zero in order to skip UFO, but it doesn't include IPcomp in transport mode
      where it equals zero.
      
      Packets, after payload compression, may not require further fragmentation,
      and if original length exceeds MTU, later compressed packets will be
      transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
      on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
      
      * IPv4 case, offset is wrong + unnecessary fragmentation
          udp_ipsec.sh -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
            proto Compressed IP (108), length 49)
            10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
          IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
            proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
      
      * IPv6 case, sending small fragments
          udp_ipsec.sh -6 -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
      
      Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
      if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
      
      Fixes: c146066a ("ipv4: Don't use ufo handling on later transformed packets")
      Fixes: f89c56ce ("ipv6: Don't use ufo handling on later transformed packets")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b3b45ed
    • P
      net/tunnel: set inner protocol in network gro hooks · 294acf1c
      Paolo Abeni 提交于
      The gso code of several tunnels type (gre and udp tunnels)
      takes for granted that the skb->inner_protocol is properly
      initialized and drops the packet elsewhere.
      
      On the forwarding path no one is initializing such field,
      so gro encapsulated packets are dropped on forward.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain
      inner header protocol"), this can be reproduced when the
      encapsulated packets use gre as the tunneling protocol.
      
      The issue happens also with vxlan and geneve tunnels since
      commit 8bce6d7d ("udp: Generalize skb_udp_segment"), if the
      forwarding host's ingress nic has h/w offload for such tunnel
      and a vxlan/geneve device is configured on top of it, regardless
      of the configured peer address and vni.
      
      To address the issue, this change initialize the inner_protocol
      field for encapsulated packets in both ipv4 and ipv6 gro complete
      callbacks.
      
      Fixes: 38720352 ("gre: Use inner_proto to obtain inner header protocol")
      Fixes: 8bce6d7d ("udp: Generalize skb_udp_segment")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      294acf1c
    • D
      net: ipv6: Remove redundant RTA_OIF in multipath routes · 5be083ce
      David Ahern 提交于
      Dinesh reported that RTA_MULTIPATH nexthops are 8-bytes larger with IPv6
      than IPv4. The recent refactoring for multipath support in netlink
      messages does discriminate between non-multipath which needs the OIF
      and multipath which adds a rtnexthop struct for each hop making the
      RTA_OIF attribute redundant. Resolve by adding a flag to the info
      function to skip the oif for multipath.
      
      Fixes: beb1afac ("net: ipv6: Add support to dump multipath routes
             via RTA_MULTIPATH attribute")
      Reported-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be083ce
  3. 08 3月, 2017 1 次提交
    • W
      ipv6: reorder icmpv6_init() and ip6_mr_init() · 15e66807
      WANG Cong 提交于
      Andrey reported the following kernel crash:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88001f311700 task.stack: ffff88001f6e8000
      RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
      RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000
      RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020
      RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000
      R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040
      FS:  00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
       sock_release+0x8d/0x1e0 net/socket.c:597
       __sock_create+0x39d/0x880 net/socket.c:1226
       sock_create_kern+0x3f/0x50 net/socket.c:1243
       inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
       icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
       ops_init+0x10a/0x550 net/core/net_namespace.c:115
       setup_net+0x261/0x660 net/core/net_namespace.c:291
       copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
      9pnet_virtio: no channels available for device ./file1
       create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
       SYSC_unshare kernel/fork.c:2281 [inline]
       SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because net->ipv6.mr6_tables is not initialized at that point,
      ip6mr_rules_init() is not called yet, therefore on the error path when
      we iterator the list, we trigger this oops. Fix this by reordering
      ip6mr_rules_init() before icmpv6_sk_init().
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e66807
  4. 03 3月, 2017 2 次提交
  5. 02 3月, 2017 4 次提交
    • I
      sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1
      Ingo Molnar 提交于
      sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      Fix up affected files that include this signal functionality via sched.h.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      174cd4b1
    • E
      ipv6: orphan skbs in reassembly unit · 48cac18e
      Eric Dumazet 提交于
      Andrey reported a use-after-free in IPv6 stack.
      
      Issue here is that we free the socket while it still has skb
      in TX path and in some queues.
      
      It happens here because IPv6 reassembly unit messes skb->truesize,
      breaking skb_set_owner_w() badly.
      
      We fixed a similar issue for IPV4 in commit 8282f274 ("inet: frag:
      Always orphan skbs inside ip_defrag()")
      Acked-by: NJoe Stringer <joe@ovn.org>
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
      Read of size 8 at addr ffff880062da0060 by task a.out/4140
      
      page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
      raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
      page dumped because: kasan: bad access detected
      
      CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       describe_address mm/kasan/report.c:262
       kasan_report_error+0x121/0x560 mm/kasan/report.c:370
       kasan_report mm/kasan/report.c:392
       __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
       sock_flag ./arch/x86/include/asm/bitops.h:324
       sock_wfree+0x118/0x120 net/core/sock.c:1631
       skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put ./include/net/inet_frag.h:133
       nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613
       rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x620 net/socket.c:848
       new_sync_write fs/read_write.c:499
       __vfs_write+0x483/0x760 fs/read_write.c:512
       vfs_write+0x187/0x530 fs/read_write.c:560
       SYSC_write fs/read_write.c:607
       SyS_write+0xfb/0x230 fs/read_write.c:599
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      RIP: 0033:0x7ff26e6f5b79
      RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
      RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
      RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
      
      The buggy address belongs to the object at ffff880062da0000
       which belongs to the cache RAWv6 of size 1504
      The buggy address ffff880062da0060 is located 96 bytes inside
       of 1504-byte region [ffff880062da0000, ffff880062da05e0)
      
      Freed by task 4113:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1352
       slab_free_freelist_hook mm/slub.c:1374
       slab_free mm/slub.c:2951
       kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
       sk_prot_free net/core/sock.c:1377
       __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sk_free+0x23/0x30 net/core/sock.c:1479
       sock_put ./include/net/sock.h:1638
       sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
       rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
       sock_release+0x8d/0x1e0 net/socket.c:599
       sock_close+0x16/0x20 net/socket.c:1063
       __fput+0x332/0x7f0 fs/file_table.c:208
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x19b/0x270 kernel/task_work.c:116
       exit_task_work ./include/linux/task_work.h:21
       do_exit+0x186b/0x2800 kernel/exit.c:839
       do_group_exit+0x149/0x420 kernel/exit.c:943
       SYSC_exit_group kernel/exit.c:954
       SyS_exit_group+0x1d/0x20 kernel/exit.c:952
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
       slab_post_alloc_hook mm/slab.h:432
       slab_alloc_node mm/slub.c:2708
       slab_alloc mm/slub.c:2716
       kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
       sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
       sk_alloc+0x105/0x1010 net/core/sock.c:1396
       inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
       __sock_create+0x4f6/0x880 net/socket.c:1199
       sock_create net/socket.c:1239
       SYSC_socket net/socket.c:1269
       SyS_socket+0xf9/0x230 net/socket.c:1249
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Memory state around the buggy address:
       ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cac18e
    • L
      net: route: add missing nla_policy entry for RTA_MARK attribute · 3b45a410
      Liping Zhang 提交于
      This will add stricter validating for RTA_MARK attribute.
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b45a410
    • F
      net/ipv6: avoid possible dead locking on addr_gen_mode sysctl · 8c171d6c
      Felix Jia 提交于
      The addr_gen_mode variable can be accessed by both sysctl and netlink.
      Repleacd rtnl_lock() with rtnl_trylock() protect the sysctl operation to
      avoid the possbile dead lock.`
      Signed-off-by: NFelix Jia <felix.jia@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c171d6c
  6. 28 2月, 2017 1 次提交
  7. 27 2月, 2017 1 次提交
  8. 25 2月, 2017 1 次提交
  9. 23 2月, 2017 1 次提交
    • A
      tcp: setup timestamp offset when write_seq already set · 00355fa5
      Alexey Kodanev 提交于
      Found that when randomized tcp offsets are enabled (by default)
      TCP client can still start new connections without them. Later,
      if server does active close and re-uses sockets in TIME-WAIT
      state, new SYN from client can be rejected on PAWS check inside
      tcp_timewait_state_process(), because either tw_ts_recent or
      rcv_tsval doesn't really have an offset set.
      
      Here is how to reproduce it with LTP netstress tool:
          netstress -R 1 &
          netstress -H 127.0.0.1 -lr 1000000 -a1
      
          [...]
          < S  seq 1956977072 win 43690 TS val 295618 ecr 459956970
          > .  ack 1956911535 win 342 TS val 459967184 ecr 1547117608
          < R  seq 1956911535 win 0 length 0
      +1. < S  seq 1956977072 win 43690 TS val 296640 ecr 459956970
          > S. seq 657450664 ack 1956977073 win 43690 TS val 459968205 ecr 296640
      
      Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00355fa5
  10. 19 2月, 2017 1 次提交
  11. 16 2月, 2017 1 次提交
  12. 15 2月, 2017 5 次提交
  13. 12 2月, 2017 1 次提交
  14. 10 2月, 2017 1 次提交
  15. 09 2月, 2017 7 次提交
    • F
      37b10383
    • F
      xfrm: policy: remove family field · a2817d8b
      Florian Westphal 提交于
      Only needed it to register the policy backend at init time.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      a2817d8b
    • F
      xfrm: policy: remove garbage_collect callback · 3d7d25a6
      Florian Westphal 提交于
      Just call xfrm_garbage_collect_deferred() directly.
      This gets rid of a write to afinfo in register/unregister and allows to
      constify afinfo later on.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      3d7d25a6
    • F
      xfrm: input: constify xfrm_input_afinfo · 960fdfde
      Florian Westphal 提交于
      Nothing writes to these structures (the module owner was not used).
      
      While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
      (INET6), not AF_MAX.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      960fdfde
    • E
      gro_cells: move to net/core/gro_cells.c · 97e219b7
      Eric Dumazet 提交于
      We have many gro cells users, so lets move the code to avoid
      duplication.
      
      This creates a CONFIG_GRO_CELLS option.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97e219b7
    • W
      sit: fix a double free on error path · d7426c69
      WANG Cong 提交于
      Dmitry reported a double free in sit_init_net():
      
        kernel BUG at mm/percpu.c:689!
        invalid opcode: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
        Hardware name: Google Google Compute Engine/Google Compute Engine,
        BIOS Google 01/01/2011
        task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
        RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
        RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
        RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
        RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
        RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
        R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
        R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
        FS:  00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
        DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
        Call Trace:
         free_percpu+0x212/0x520 mm/percpu.c:1264
         ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
         sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
         ops_init+0x10a/0x530 net/core/net_namespace.c:115
         setup_net+0x2ed/0x690 net/core/net_namespace.c:291
         copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
         create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
         unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
         SYSC_unshare kernel/fork.c:2281 [inline]
         SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because when tunnel->dst_cache init fails, we free dev->tstats
      once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
      redundant but its ndo_uinit() does not seem enough to clean up everything
      here. So avoid this by setting dev->tstats to NULL after the first free,
      at least for -net.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7426c69
    • M
      ipv6: addrconf: fix generation of new temporary addresses · a11a7f71
      Marcus Huewe 提交于
      Under some circumstances it is possible that no new temporary addresses
      will be generated.
      
      For instance, addrconf_prefix_rcv_add_addr() indirectly calls
      ipv6_create_tempaddr(), which creates a tentative temporary address and
      starts dad. Next, addrconf_prefix_rcv_add_addr() indirectly calls
      addrconf_verify_rtnl(). Now, assume that the previously created temporary
      address has the least preferred lifetime among all existing addresses and
      is still tentative (that is, dad is still running). Hence, the next run of
      addrconf_verify_rtnl() is performed when the preferred lifetime of the
      temporary address ends. If dad succeeds before the next run, the temporary
      address becomes deprecated during the next run, but no new temporary
      address is generated.
      
      In order to fix this, schedule the next addrconf_verify_rtnl() run slightly
      before the temporary address becomes deprecated, if dad succeeded.
      Signed-off-by: NMarcus Huewe <suse-tux@gmx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a11a7f71
  16. 08 2月, 2017 5 次提交
  17. 07 2月, 2017 1 次提交
    • L
      ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches · a088d1d7
      Linus Lüssing 提交于
      When for instance a mobile Linux device roams from one access point to
      another with both APs sharing the same broadcast domain and a
      multicast snooping switch in between:
      
      1)    (c) <~~~> (AP1) <--[SSW]--> (AP2)
      
      2)              (AP1) <--[SSW]--> (AP2) <~~~> (c)
      
      Then currently IPv6 multicast packets will get lost for (c) until an
      MLD Querier sends its next query message. The packet loss occurs
      because upon roaming the Linux host so far stayed silent regarding
      MLD and the snooping switch will therefore be unaware of the
      multicast topology change for a while.
      
      This patch fixes this by always resending MLD reports when an interface
      change happens, for instance from NO-CARRIER to CARRIER state.
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a088d1d7
  18. 06 2月, 2017 1 次提交