1. 20 2月, 2018 14 次提交
  2. 15 2月, 2018 2 次提交
    • D
      net: Move ipv4 set_lwt_redirect helper to lwtunnel · 9942895b
      David Ahern 提交于
      IPv4 uses set_lwt_redirect to set the lwtunnel redirect functions as
      needed. Move it to lwtunnel.h as lwtunnel_set_redirect and change
      IPv6 to also use it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9942895b
    • E
      tcp: try to keep packet if SYN_RCV race is lost · e0f9759f
      Eric Dumazet 提交于
      배석진 reported that in some situations, packets for a given 5-tuple
      end up being processed by different CPUS.
      
      This involves RPS, and fragmentation.
      
      배석진 is seeing packet drops when a SYN_RECV request socket is
      moved into ESTABLISH state. Other states are protected by socket lock.
      
      This is caused by a CPU losing the race, and simply not caring enough.
      
      Since this seems to occur frequently, we can do better and perform
      a second lookup.
      
      Note that all needed memory barriers are already in the existing code,
      thanks to the spin_lock()/spin_unlock() pair in inet_ehash_insert()
      and reqsk_put(). The second lookup must find the new socket,
      unless it has already been accepted and closed by another cpu.
      
      Note that the fragmentation could be avoided in the first place by
      use of a correct TCP MSS option in the SYN{ACK} packet, but this
      does not mean we can not be more robust.
      
      Many thanks to 배석진 for a very detailed analysis.
      Reported-by: N배석진 <soukjin.bae@samsung.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0f9759f
  3. 13 2月, 2018 2 次提交
    • K
      net: Convert addrconf_ops · 0bc9be67
      Kirill Tkhai 提交于
      These pernet_operations (un)register sysctl, which
      are not touched by anybody else.
      
      So, it's safe to make them async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bc9be67
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  4. 08 2月, 2018 3 次提交
  5. 07 2月, 2018 4 次提交
  6. 02 2月, 2018 2 次提交
    • P
      netfilter: flowtable infrastructure depends on NETFILTER_INGRESS · 6be3bcd7
      Pablo Neira Ayuso 提交于
      config NF_FLOW_TABLE depends on NETFILTER_INGRESS. If users forget to
      enable this toggle, flowtable registration fails with EOPNOTSUPP.
      
      Moreover, turn 'select NF_FLOW_TABLE' in every flowtable family flavour
      into dependency instead, otherwise this new dependency on
      NETFILTER_INGRESS causes a warning. This also allows us to remove the
      explicit dependency between family flowtables <-> NF_TABLES and
      NF_CONNTRACK, given they depend on the NF_FLOW_TABLE core that already
      expresses the general dependencies for this new infrastructure.
      
      Moreover, NF_FLOW_TABLE_INET depends on NF_FLOW_TABLE_IPV4 and
      NF_FLOWTABLE_IPV6, which already depends on NF_FLOW_TABLE. So we can get
      rid of direct dependency with NF_FLOW_TABLE.
      
      In general, let's avoid 'select', it just makes things more complicated.
      Reported-by: NJohn Crispin <john@phrozen.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6be3bcd7
    • S
      netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure · ea23d5e3
      Subash Abhinov Kasiviswanathan 提交于
      Failures were seen in ICMPv6 fragmentation timeout tests if they were
      run after the RFC2460 failure tests. Kernel was not sending out the
      ICMPv6 fragment reassembly time exceeded packet after the fragmentation
      reassembly timeout of 1 minute had elapsed.
      
      This happened because the frag queue was not released if an error in
      IPv6 fragmentation header was detected by RFC2460.
      
      Fixes: 83f1999c ("netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460")
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ea23d5e3
  7. 31 1月, 2018 2 次提交
    • P
      netfilter: on sockopt() acquire sock lock only in the required scope · 3f34cfae
      Paolo Abeni 提交于
      Syzbot reported several deadlocks in the netfilter area caused by
      rtnl lock and socket lock being acquired with a different order on
      different code paths, leading to backtraces like the following one:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.15.0-rc9+ #212 Not tainted
      ------------------------------------------------------
      syzkaller041579/3682 is trying to acquire lock:
        (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock
      include/net/sock.h:1463 [inline]
        (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>]
      do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
      
      but task is already holding lock:
        (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
      net/core/rtnetlink.c:74
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}:
              __mutex_lock_common kernel/locking/mutex.c:756 [inline]
              __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
              mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
              rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
              register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
              tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
              xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
              check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
              find_check_entry.isra.7+0x935/0xcf0
      net/ipv6/netfilter/ip6_tables.c:580
              translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
              do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
              do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
              nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
              nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
              ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
              udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
              sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
              SYSC_setsockopt net/socket.c:1849 [inline]
              SyS_setsockopt+0x189/0x360 net/socket.c:1828
              entry_SYSCALL_64_fastpath+0x29/0xa0
      
      -> #0 (sk_lock-AF_INET6){+.+.}:
              lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
              lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
              lock_sock include/net/sock.h:1463 [inline]
              do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
              ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
              udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
              sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
              SYSC_setsockopt net/socket.c:1849 [inline]
              SyS_setsockopt+0x189/0x360 net/socket.c:1828
              entry_SYSCALL_64_fastpath+0x29/0xa0
      
      other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(rtnl_mutex);
                                      lock(sk_lock-AF_INET6);
                                      lock(rtnl_mutex);
         lock(sk_lock-AF_INET6);
      
        *** DEADLOCK ***
      
      1 lock held by syzkaller041579/3682:
        #0:  (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
      net/core/rtnetlink.c:74
      
      The problem, as Florian noted, is that nf_setsockopt() is always
      called with the socket held, even if the lock itself is required only
      for very tight scopes and only for some operation.
      
      This patch addresses the issues moving the lock_sock() call only
      where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
      does not need anymore to acquire both locks.
      
      Fixes: 22265a5c ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
      Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3f34cfae
    • N
      ip6mr: fix stale iterator · 4adfa79f
      Nikolay Aleksandrov 提交于
      When we dump the ip6mr mfc entries via proc, we initialize an iterator
      with the table to dump but we don't clear the cache pointer which might
      be initialized from a prior read on the same descriptor that ended. This
      can result in lock imbalance (an unnecessary unlock) leading to other
      crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
      Thanks for the reliable reproducer.
      
      Here's syzbot's trace:
       WARNING: bad unlock balance detected!
       4.15.0-rc3+ #128 Not tainted
       syzkaller971460/3195 is trying to release lock (mrt_lock) at:
       [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
       but there are no more locks to release!
      
       other info that might help us debug this:
       1 lock held by syzkaller971460/3195:
        #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
       fs/seq_file.c:165
      
       stack backtrace:
       CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
       Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
        __lock_release kernel/locking/lockdep.c:3775 [inline]
        lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
        __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
        _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
        ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
        traverse+0x3bc/0xa00 fs/seq_file.c:135
        seq_read+0x96a/0x13d0 fs/seq_file.c:189
        proc_reg_read+0xef/0x170 fs/proc/inode.c:217
        do_loop_readv_writev fs/read_write.c:673 [inline]
        do_iter_read+0x3db/0x5b0 fs/read_write.c:897
        compat_readv+0x1bf/0x270 fs/read_write.c:1140
        do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
        C_SYSC_preadv fs/read_write.c:1209 [inline]
        compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
       RIP: 0023:0xf7f73c79
       RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
       RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
       RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       BUG: sleeping function called from invalid context at lib/usercopy.c:25
       in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
       INFO: lockdep is turned off.
       CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
       Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
        __might_sleep+0x95/0x190 kernel/sched/core.c:6013
        __might_fault+0xab/0x1d0 mm/memory.c:4525
        _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
        copy_to_user include/linux/uaccess.h:155 [inline]
        seq_read+0xcb4/0x13d0 fs/seq_file.c:279
        proc_reg_read+0xef/0x170 fs/proc/inode.c:217
        do_loop_readv_writev fs/read_write.c:673 [inline]
        do_iter_read+0x3db/0x5b0 fs/read_write.c:897
        compat_readv+0x1bf/0x270 fs/read_write.c:1140
        do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
        C_SYSC_preadv fs/read_write.c:1209 [inline]
        compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
       RIP: 0023:0xf7f73c79
       RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
       RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
       RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
       lib/usercopy.c:26
      Reported-by: Nsyzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4adfa79f
  8. 30 1月, 2018 4 次提交
    • E
      ipv6: addrconf: break critical section in addrconf_verify_rtnl() · e64e469b
      Eric Dumazet 提交于
      Heiner reported a lockdep splat [1]
      
      This is caused by attempting GFP_KERNEL allocation while RCU lock is
      held and BH blocked.
      
      We believe that addrconf_verify_rtnl() could run for a long period,
      so instead of using GFP_ATOMIC here as Ido suggested, we should break
      the critical section and restart it after the allocation.
      
      [1]
      [86220.125562] =============================
      [86220.125586] WARNING: suspicious RCU usage
      [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
      [86220.125641] -----------------------------
      [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section!
      [86220.125711]
                     other info that might help us debug this:
      
      [86220.125755]
                     rcu_scheduler_active = 2, debug_locks = 1
      [86220.125792] 4 locks held by kworker/0:2/1003:
      [86220.125817]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.125895]  #1:  ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.125959]  #2:  (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
      [86220.126017]  #3:  (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
      [86220.126111]
                     stack backtrace:
      [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
      [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
      [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
      [86220.126288] Call Trace:
      [86220.126312]  dump_stack+0x70/0x9e
      [86220.126337]  lockdep_rcu_suspicious+0xce/0xf0
      [86220.126365]  ___might_sleep+0x1d3/0x240
      [86220.126390]  __might_sleep+0x45/0x80
      [86220.126416]  kmem_cache_alloc_trace+0x53/0x250
      [86220.126458]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.126498]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.126538]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.126580]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.126623]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.126664]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.126708]  addrconf_verify_work+0xe/0x20 [ipv6]
      [86220.126738]  process_one_work+0x258/0x680
      [86220.126765]  worker_thread+0x35/0x3f0
      [86220.126790]  kthread+0x124/0x140
      [86220.126813]  ? process_one_work+0x680/0x680
      [86220.126839]  ? kthread_create_worker_on_cpu+0x40/0x40
      [86220.126869]  ? umh_complete+0x40/0x40
      [86220.126893]  ? call_usermodehelper_exec_async+0x12a/0x160
      [86220.126926]  ret_from_fork+0x4b/0x60
      [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420
      [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
      [86220.127082] 4 locks held by kworker/0:2/1003:
      [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
      [86220.127300]  #3:  (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
      [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
      [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
      [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
      [86220.127568] Call Trace:
      [86220.127591]  dump_stack+0x70/0x9e
      [86220.127616]  ___might_sleep+0x14d/0x240
      [86220.127644]  __might_sleep+0x45/0x80
      [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
      [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
      [86220.128032]  process_one_work+0x258/0x680
      [86220.128063]  worker_thread+0x35/0x3f0
      [86220.128091]  kthread+0x124/0x140
      [86220.128117]  ? process_one_work+0x680/0x680
      [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
      [86220.128180]  ? umh_complete+0x40/0x40
      [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
      [86220.128243]  ret_from_fork+0x4b/0x60
      
      Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e64e469b
    • W
      ipv6: change route cache aging logic · 31afeb42
      Wei Wang 提交于
      In current route cache aging logic, if a route has both RTF_EXPIRE and
      RTF_GATEWAY set, the route will only be removed if the neighbor cache
      has no NTF_ROUTER flag. Otherwise, even if the route has expired, it
      won't get deleted.
      Fix this logic to always check if the route has expired first and then
      do the gateway neighbor cache check if previous check decide to not
      remove the exception entry.
      
      Fixes: 1859bac0 ("ipv6: remove from fib tree aged out RTF_CACHE dst")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31afeb42
    • D
      net: ipv6: send unsolicited NA after DAD · c76fe2d9
      David Ahern 提交于
      Unsolicited IPv6 neighbor advertisements should be sent after DAD
      completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
      addresses and have those sent by addrconf_dad_completed after DAD.
      
      Fixes: 4a6e3c5d ("net: ipv6: send unsolicited NA on admin up")
      Reported-by: NVivek Venkatraman <vivek@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c76fe2d9
    • M
      ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only · 7ece54a6
      Martin KaFai Lau 提交于
      If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
      implicitly implies it is an ipv6only socket.  However, in inet6_bind(),
      this addr_type checking and setting sk->sk_ipv6only to 1 are only done
      after sk->sk_prot->get_port(sk, snum) has been completed successfully.
      
      This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
      the 'get_port()'.
      
      In particular, when binding SO_REUSEPORT UDP sockets,
      udp_reuseport_add_sock(sk,...) is called.  udp_reuseport_add_sock()
      checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
      sk2->sk_reuseport_cb.  In this case, ipv6_only_sock(sk2) could be
      1 while ipv6_only_sock(sk) is still 0 here.  The end result is,
      reuseport_alloc(sk) is called instead of adding sk to the existing
      sk2->sk_reuseport_cb.
      
      It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
      IPv6 address (!ANY and !MAPPED).  Only one of the socket will
      receive packet.
      
      The fix is to set the implicit sk_ipv6only before calling get_port().
      The original sk_ipv6only has to be saved such that it can be restored
      in case get_port() failed.  The situation is similar to the
      inet_reset_saddr(sk) after get_port() has failed.
      
      Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
      reproduction which leads to a fix.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ece54a6
  9. 26 1月, 2018 6 次提交
  10. 24 1月, 2018 1 次提交