1. 10 6月, 2017 1 次提交
    • K
      Ipvlan should return an error when an address is already in use. · 3ad7d246
      Krister Johansen 提交于
      The ipvlan code already knows how to detect when a duplicate address is
      about to be assigned to an ipvlan device.  However, that failure is not
      propogated outward and leads to a silent failure.
      
      Introduce a validation step at ip address creation time and allow device
      drivers to register to validate the incoming ip addresses.  The ipvlan
      code is the first consumer.  If it detects an address in use, we can
      return an error to the user before beginning to commit the new ifa in
      the networking code.
      
      This can be especially useful if it is necessary to provision many
      ipvlans in containers.  The provisioning software (or operator) can use
      this to detect situations where an ip address is unexpectedly in use.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ad7d246
  2. 08 6月, 2017 4 次提交
  3. 07 6月, 2017 1 次提交
  4. 06 6月, 2017 1 次提交
  5. 05 6月, 2017 5 次提交
  6. 03 6月, 2017 1 次提交
  7. 30 5月, 2017 2 次提交
  8. 27 5月, 2017 2 次提交
  9. 23 5月, 2017 2 次提交
  10. 22 5月, 2017 4 次提交
  11. 20 5月, 2017 1 次提交
  12. 18 5月, 2017 4 次提交
    • P
      udp: make *udp*_queue_rcv_skb() functions static · a3f96c47
      Paolo Abeni 提交于
      Since the udp memory accounting refactor, we don't need any more
      to export the *udp*_queue_rcv_skb(). Make them static and fix
      a couple of sparse warnings:
      
      net/ipv4/udp.c:1615:5: warning: symbol 'udp_queue_rcv_skb' was not
      declared. Should it be static?
      net/ipv6/udp.c:572:5: warning: symbol 'udpv6_queue_rcv_skb' was not
      declared. Should it be static?
      
      Fixes: 850cbadd ("udp: use it's own memory accounting schema")
      Fixes: c915fe13 ("udplite: fix NULL pointer dereference")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3f96c47
    • D
      ipv6: Check ip6_find_1stfragopt() return value properly. · 7dd7eb95
      David S. Miller 提交于
      Do not use unsigned variables to see if it returns a negative
      error or not.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dd7eb95
    • E
      tcp: switch TCP TS option (RFC 7323) to 1ms clock · 9a568de4
      Eric Dumazet 提交于
      TCP Timestamps option is defined in RFC 7323
      
      Traditionally on linux, it has been tied to the internal
      'jiffies' variable, because it had been a cheap and good enough
      generator.
      
      For TCP flows on the Internet, 1 ms resolution would be much better
      than 4ms or 10ms (HZ=250 or HZ=100 respectively)
      
      For TCP flows in the DC, Google has used usec resolution for more
      than two years with great success [1]
      
      Receive size autotuning (DRS) is indeed more precise and converges
      faster to optimal window size.
      
      This patch converts tp->tcp_mstamp to a plain u64 value storing
      a 1 usec TCP clock.
      
      This choice will allow us to upstream the 1 usec TS option as
      discussed in IETF 97.
      
      [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a568de4
    • C
      ipv6: Prevent overrun when parsing v6 header options · 2423496a
      Craig Gallek 提交于
      The KASAN warning repoted below was discovered with a syzkaller
      program.  The reproducer is basically:
        int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
        send(s, &one_byte_of_data, 1, MSG_MORE);
        send(s, &more_than_mtu_bytes_data, 2000, 0);
      
      The socket() call sets the nexthdr field of the v6 header to
      NEXTHDR_HOP, the first send call primes the payload with a non zero
      byte of data, and the second send call triggers the fragmentation path.
      
      The fragmentation code tries to parse the header options in order
      to figure out where to insert the fragment option.  Since nexthdr points
      to an invalid option, the calculation of the size of the network header
      can made to be much larger than the linear section of the skb and data
      is read outside of it.
      
      This fix makes ip6_find_1stfrag return an error if it detects
      running out-of-bounds.
      
      [   42.361487] ==================================================================
      [   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
      [   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
      [   42.366469]
      [   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
      [   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [   42.368824] Call Trace:
      [   42.369183]  dump_stack+0xb3/0x10b
      [   42.369664]  print_address_description+0x73/0x290
      [   42.370325]  kasan_report+0x252/0x370
      [   42.370839]  ? ip6_fragment+0x11c8/0x3730
      [   42.371396]  check_memory_region+0x13c/0x1a0
      [   42.371978]  memcpy+0x23/0x50
      [   42.372395]  ip6_fragment+0x11c8/0x3730
      [   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
      [   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
      [   42.374263]  ? ip6_forward+0x2e30/0x2e30
      [   42.374803]  ip6_finish_output+0x584/0x990
      [   42.375350]  ip6_output+0x1b7/0x690
      [   42.375836]  ? ip6_finish_output+0x990/0x990
      [   42.376411]  ? ip6_fragment+0x3730/0x3730
      [   42.376968]  ip6_local_out+0x95/0x160
      [   42.377471]  ip6_send_skb+0xa1/0x330
      [   42.377969]  ip6_push_pending_frames+0xb3/0xe0
      [   42.378589]  rawv6_sendmsg+0x2051/0x2db0
      [   42.379129]  ? rawv6_bind+0x8b0/0x8b0
      [   42.379633]  ? _copy_from_user+0x84/0xe0
      [   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
      [   42.380878]  ? ___sys_sendmsg+0x162/0x930
      [   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
      [   42.382074]  ? sock_has_perm+0x1f6/0x290
      [   42.382614]  ? ___sys_sendmsg+0x167/0x930
      [   42.383173]  ? lock_downgrade+0x660/0x660
      [   42.383727]  inet_sendmsg+0x123/0x500
      [   42.384226]  ? inet_sendmsg+0x123/0x500
      [   42.384748]  ? inet_recvmsg+0x540/0x540
      [   42.385263]  sock_sendmsg+0xca/0x110
      [   42.385758]  SYSC_sendto+0x217/0x380
      [   42.386249]  ? SYSC_connect+0x310/0x310
      [   42.386783]  ? __might_fault+0x110/0x1d0
      [   42.387324]  ? lock_downgrade+0x660/0x660
      [   42.387880]  ? __fget_light+0xa1/0x1f0
      [   42.388403]  ? __fdget+0x18/0x20
      [   42.388851]  ? sock_common_setsockopt+0x95/0xd0
      [   42.389472]  ? SyS_setsockopt+0x17f/0x260
      [   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
      [   42.390650]  SyS_sendto+0x40/0x50
      [   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.391731] RIP: 0033:0x7fbbb711e383
      [   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
      [   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
      [   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
      [   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
      [   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
      [   42.397257]
      [   42.397411] Allocated by task 3789:
      [   42.397702]  save_stack_trace+0x16/0x20
      [   42.398005]  save_stack+0x46/0xd0
      [   42.398267]  kasan_kmalloc+0xad/0xe0
      [   42.398548]  kasan_slab_alloc+0x12/0x20
      [   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
      [   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
      [   42.399654]  __alloc_skb+0xf8/0x580
      [   42.400003]  sock_wmalloc+0xab/0xf0
      [   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
      [   42.400813]  ip6_append_data+0x1a8/0x2f0
      [   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
      [   42.401505]  inet_sendmsg+0x123/0x500
      [   42.401860]  sock_sendmsg+0xca/0x110
      [   42.402209]  ___sys_sendmsg+0x7cb/0x930
      [   42.402582]  __sys_sendmsg+0xd9/0x190
      [   42.402941]  SyS_sendmsg+0x2d/0x50
      [   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.403718]
      [   42.403871] Freed by task 1794:
      [   42.404146]  save_stack_trace+0x16/0x20
      [   42.404515]  save_stack+0x46/0xd0
      [   42.404827]  kasan_slab_free+0x72/0xc0
      [   42.405167]  kfree+0xe8/0x2b0
      [   42.405462]  skb_free_head+0x74/0xb0
      [   42.405806]  skb_release_data+0x30e/0x3a0
      [   42.406198]  skb_release_all+0x4a/0x60
      [   42.406563]  consume_skb+0x113/0x2e0
      [   42.406910]  skb_free_datagram+0x1a/0xe0
      [   42.407288]  netlink_recvmsg+0x60d/0xe40
      [   42.407667]  sock_recvmsg+0xd7/0x110
      [   42.408022]  ___sys_recvmsg+0x25c/0x580
      [   42.408395]  __sys_recvmsg+0xd6/0x190
      [   42.408753]  SyS_recvmsg+0x2d/0x50
      [   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.409513]
      [   42.409665] The buggy address belongs to the object at ffff88000969e780
      [   42.409665]  which belongs to the cache kmalloc-512 of size 512
      [   42.410846] The buggy address is located 24 bytes inside of
      [   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
      [   42.411941] The buggy address belongs to the page:
      [   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      [   42.413298] flags: 0x100000000008100(slab|head)
      [   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
      [   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
      [   42.415074] page dumped because: kasan: bad access detected
      [   42.415604]
      [   42.415757] Memory state around the buggy address:
      [   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   42.418273]                    ^
      [   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419882] ==================================================================
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2423496a
  13. 17 5月, 2017 1 次提交
    • P
      udp: use a separate rx queue for packet reception · 2276f58a
      Paolo Abeni 提交于
      under udp flood the sk_receive_queue spinlock is heavily contended.
      This patch try to reduce the contention on such lock adding a
      second receive queue to the udp sockets; recvmsg() looks first
      in such queue and, only if empty, tries to fetch the data from
      sk_receive_queue. The latter is spliced into the newly added
      queue every time the receive path has to acquire the
      sk_receive_queue lock.
      
      The accounting of forward allocated memory is still protected with
      the sk_receive_queue lock, so udp_rmem_release() needs to acquire
      both locks when the forward deficit is flushed.
      
      On specific scenarios we can end up acquiring and releasing the
      sk_receive_queue lock multiple times; that will be covered by
      the next patch
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2276f58a
  14. 16 5月, 2017 1 次提交
    • M
      ipv6: avoid dad-failures for addresses with NODAD · 66eb9f86
      Mahesh Bandewar 提交于
      Every address gets added with TENTATIVE flag even for the addresses with
      IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
      we realize it's an address with NODAD and complete the process without
      sending any probe. However the TENTATIVE flags stays on the
      address for sometime enough to cause misinterpretation when we receive a NS.
      While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
      and endup with an address that was originally configured as NODAD with
      DADFAILED.
      
      We can't avoid scheduling dad_work for addresses with NODAD but we can
      avoid adding TENTATIVE flag to avoid this racy situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66eb9f86
  15. 12 5月, 2017 1 次提交
  16. 09 5月, 2017 2 次提交
  17. 06 5月, 2017 1 次提交
    • E
      tcp: randomize timestamps on syncookies · 84b114b9
      Eric Dumazet 提交于
      Whole point of randomization was to hide server uptime, but an attacker
      can simply start a syn flood and TCP generates 'old style' timestamps,
      directly revealing server jiffies value.
      
      Also, TSval sent by the server to a particular remote address vary
      depending on syncookies being sent or not, potentially triggering PAWS
      drops for innocent clients.
      
      Lets implement proper randomization, including for SYNcookies.
      
      Also we do not need to export sysctl_tcp_timestamps, since it is not
      used from a module.
      
      In v2, I added Florian feedback and contribution, adding tsoff to
      tcp_get_cookie_sock().
      
      v3 removed one unused variable in tcp_v4_connect() as Florian spotted.
      
      Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Tested-by: NFlorian Westphal <fw@strlen.de>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84b114b9
  18. 05 5月, 2017 1 次提交
  19. 04 5月, 2017 1 次提交
    • A
      ipv4, ipv6: ensure raw socket message is big enough to hold an IP header · 86f4c90a
      Alexander Potapenko 提交于
      raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied
      from the userspace contains the IPv4/IPv6 header, so if too few bytes are
      copied, parts of the header may remain uninitialized.
      
      This bug has been detected with KMSAN.
      
      For the record, the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0
      inter: 0
      CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
       __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
       nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577
       ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       NF_HOOK ./include/linux/netfilter.h:255
       rawv6_send_hdrinc net/ipv6/raw.c:673
       rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x436e03
      RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000
      origin: 00000000d9400053
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
       slab_alloc_node mm/slub.c:2735
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
       sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
       sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
       rawv6_send_hdrinc net/ipv6/raw.c:638
       rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      , triggered by the following syscalls:
        socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
        sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM
      
      A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket
      instead of a PF_INET6 one.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f4c90a
  20. 03 5月, 2017 1 次提交
    • D
      net: ipv6: Do not duplicate DAD on link up · 6d717134
      David Ahern 提交于
      Andrey reported a warning triggered by the rcu code:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289
      debug_print_object+0x175/0x210
      ODEBUG: activate active (active state 1) object type: rcu_head hint:
              (null)
      Modules linked in:
      CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x192/0x22d lib/dump_stack.c:52
       __warn+0x19f/0x1e0 kernel/panic.c:549
       warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564
       debug_print_object+0x175/0x210 lib/debugobjects.c:286
       debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442
       debug_rcu_head_queue kernel/rcu/rcu.h:75
       __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229
       call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288
       rt6_rcu_free net/ipv6/ip6_fib.c:158
       rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188
       fib6_del_route net/ipv6/ip6_fib.c:1461
       fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500
       __ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174
       ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187
       __ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520
       addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672
      ...
      
      Andrey's reproducer program runs in a very tight loop, calling
      'unshare -n' and then spawning 2 sets of 14 threads running random ioctl
      calls. The relevant networking sequence:
      
      1. New network namespace created via unshare -n
      - ip6tnl0 device is created in down state
      
      2. address added to ip6tnl0
      - equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1
      - DAD is started on the address and when it completes the host
        route is inserted into the FIB
      
      3. ip6tnl0 is brought up
      - the new fixup_permanent_addr function restarts DAD on the address
      
      4. exit namespace
      - teardown / cleanup sequence starts
      - once in a blue moon, lo teardown appears to happen BEFORE teardown
        of ip6tunl0
        + down on 'lo' removes the host route from the FIB since the dst->dev
          for the route is loobback
        + host route added to rcu callback list
          * rcu callback has not run yet, so rt is NOT on the gc list so it has
            NOT been marked obsolete
      
      5. in parallel to 4. worker_thread runs addrconf_dad_completed
      - DAD on the address on ip6tnl0 completes
      - calls ipv6_ifa_notify which inserts the host route
      
      All of that happens very quickly. The result is that a host route that
      has been deleted from the IPv6 FIB and added to the RCU list is re-inserted
      into the FIB.
      
      The exit namespace eventually gets to cleaning up ip6tnl0 which removes the
      host route from the FIB again, calls the rcu function for cleanup -- and
      triggers the double rcu trace.
      
      The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably,
      DAD should not be started in step 2. The interface is in the down state,
      so it can not really send out requests for the address which makes starting
      DAD pointless.
      
      Since the second DAD was introduced by a recent change, seems appropriate
      to use it for the Fixes tag and have the fixup function only start DAD for
      addresses in the PREDAD state which occurs in addrconf_ifdown if the
      address is retained.
      
      Big thanks to Andrey for isolating a reliable reproducer for this problem.
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d717134
  21. 02 5月, 2017 2 次提交
    • D
      ipv6: Need to export ipv6_push_frag_opts for tunneling now. · 5b8481fa
      David S. Miller 提交于
      Since that change also made the nfrag function not necessary
      for exports, remove it.
      
      Fixes: 89a23c8b ("ip6_tunnel: Fix missing tunnel encapsulation limit option")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8481fa
    • C
      ip6_tunnel: Fix missing tunnel encapsulation limit option · 89a23c8b
      Craig Gallek 提交于
      The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
      IPV6_TLV_PADN options when an encapsulation limit is defined (the
      default is a limit of 4).  An MTU adjustment is done to account for
      these options as well.  However, the options are never present in the
      generated packets.
      
      The issue appears to be a subtlety between IPV6_DSTOPTS and
      IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
      written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
      dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
      (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
      are to be used.  This caused the options to no longer be included in v6
      encapsulated packets.
      
      The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
      instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
      
      Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
      Fixes: 333fad53: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89a23c8b
  22. 27 4月, 2017 1 次提交