1. 31 1月, 2018 2 次提交
    • P
      netfilter: on sockopt() acquire sock lock only in the required scope · 3f34cfae
      Paolo Abeni 提交于
      Syzbot reported several deadlocks in the netfilter area caused by
      rtnl lock and socket lock being acquired with a different order on
      different code paths, leading to backtraces like the following one:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.15.0-rc9+ #212 Not tainted
      ------------------------------------------------------
      syzkaller041579/3682 is trying to acquire lock:
        (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock
      include/net/sock.h:1463 [inline]
        (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>]
      do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
      
      but task is already holding lock:
        (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
      net/core/rtnetlink.c:74
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}:
              __mutex_lock_common kernel/locking/mutex.c:756 [inline]
              __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
              mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
              rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
              register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
              tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
              xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
              check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
              find_check_entry.isra.7+0x935/0xcf0
      net/ipv6/netfilter/ip6_tables.c:580
              translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
              do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
              do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
              nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
              nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
              ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
              udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
              sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
              SYSC_setsockopt net/socket.c:1849 [inline]
              SyS_setsockopt+0x189/0x360 net/socket.c:1828
              entry_SYSCALL_64_fastpath+0x29/0xa0
      
      -> #0 (sk_lock-AF_INET6){+.+.}:
              lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
              lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
              lock_sock include/net/sock.h:1463 [inline]
              do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
              ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
              udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
              sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
              SYSC_setsockopt net/socket.c:1849 [inline]
              SyS_setsockopt+0x189/0x360 net/socket.c:1828
              entry_SYSCALL_64_fastpath+0x29/0xa0
      
      other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(rtnl_mutex);
                                      lock(sk_lock-AF_INET6);
                                      lock(rtnl_mutex);
         lock(sk_lock-AF_INET6);
      
        *** DEADLOCK ***
      
      1 lock held by syzkaller041579/3682:
        #0:  (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
      net/core/rtnetlink.c:74
      
      The problem, as Florian noted, is that nf_setsockopt() is always
      called with the socket held, even if the lock itself is required only
      for very tight scopes and only for some operation.
      
      This patch addresses the issues moving the lock_sock() call only
      where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
      does not need anymore to acquire both locks.
      
      Fixes: 22265a5c ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
      Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3f34cfae
    • N
      ip6mr: fix stale iterator · 4adfa79f
      Nikolay Aleksandrov 提交于
      When we dump the ip6mr mfc entries via proc, we initialize an iterator
      with the table to dump but we don't clear the cache pointer which might
      be initialized from a prior read on the same descriptor that ended. This
      can result in lock imbalance (an unnecessary unlock) leading to other
      crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
      Thanks for the reliable reproducer.
      
      Here's syzbot's trace:
       WARNING: bad unlock balance detected!
       4.15.0-rc3+ #128 Not tainted
       syzkaller971460/3195 is trying to release lock (mrt_lock) at:
       [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
       but there are no more locks to release!
      
       other info that might help us debug this:
       1 lock held by syzkaller971460/3195:
        #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
       fs/seq_file.c:165
      
       stack backtrace:
       CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
       Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
        __lock_release kernel/locking/lockdep.c:3775 [inline]
        lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
        __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
        _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
        ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
        traverse+0x3bc/0xa00 fs/seq_file.c:135
        seq_read+0x96a/0x13d0 fs/seq_file.c:189
        proc_reg_read+0xef/0x170 fs/proc/inode.c:217
        do_loop_readv_writev fs/read_write.c:673 [inline]
        do_iter_read+0x3db/0x5b0 fs/read_write.c:897
        compat_readv+0x1bf/0x270 fs/read_write.c:1140
        do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
        C_SYSC_preadv fs/read_write.c:1209 [inline]
        compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
       RIP: 0023:0xf7f73c79
       RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
       RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
       RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       BUG: sleeping function called from invalid context at lib/usercopy.c:25
       in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
       INFO: lockdep is turned off.
       CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
       Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
        __might_sleep+0x95/0x190 kernel/sched/core.c:6013
        __might_fault+0xab/0x1d0 mm/memory.c:4525
        _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
        copy_to_user include/linux/uaccess.h:155 [inline]
        seq_read+0xcb4/0x13d0 fs/seq_file.c:279
        proc_reg_read+0xef/0x170 fs/proc/inode.c:217
        do_loop_readv_writev fs/read_write.c:673 [inline]
        do_iter_read+0x3db/0x5b0 fs/read_write.c:897
        compat_readv+0x1bf/0x270 fs/read_write.c:1140
        do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
        C_SYSC_preadv fs/read_write.c:1209 [inline]
        compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
       RIP: 0023:0xf7f73c79
       RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
       RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
       RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
       lib/usercopy.c:26
      Reported-by: Nsyzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4adfa79f
  2. 30 1月, 2018 4 次提交
    • E
      ipv6: addrconf: break critical section in addrconf_verify_rtnl() · e64e469b
      Eric Dumazet 提交于
      Heiner reported a lockdep splat [1]
      
      This is caused by attempting GFP_KERNEL allocation while RCU lock is
      held and BH blocked.
      
      We believe that addrconf_verify_rtnl() could run for a long period,
      so instead of using GFP_ATOMIC here as Ido suggested, we should break
      the critical section and restart it after the allocation.
      
      [1]
      [86220.125562] =============================
      [86220.125586] WARNING: suspicious RCU usage
      [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
      [86220.125641] -----------------------------
      [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section!
      [86220.125711]
                     other info that might help us debug this:
      
      [86220.125755]
                     rcu_scheduler_active = 2, debug_locks = 1
      [86220.125792] 4 locks held by kworker/0:2/1003:
      [86220.125817]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.125895]  #1:  ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.125959]  #2:  (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
      [86220.126017]  #3:  (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
      [86220.126111]
                     stack backtrace:
      [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
      [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
      [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
      [86220.126288] Call Trace:
      [86220.126312]  dump_stack+0x70/0x9e
      [86220.126337]  lockdep_rcu_suspicious+0xce/0xf0
      [86220.126365]  ___might_sleep+0x1d3/0x240
      [86220.126390]  __might_sleep+0x45/0x80
      [86220.126416]  kmem_cache_alloc_trace+0x53/0x250
      [86220.126458]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.126498]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.126538]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.126580]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.126623]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.126664]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.126708]  addrconf_verify_work+0xe/0x20 [ipv6]
      [86220.126738]  process_one_work+0x258/0x680
      [86220.126765]  worker_thread+0x35/0x3f0
      [86220.126790]  kthread+0x124/0x140
      [86220.126813]  ? process_one_work+0x680/0x680
      [86220.126839]  ? kthread_create_worker_on_cpu+0x40/0x40
      [86220.126869]  ? umh_complete+0x40/0x40
      [86220.126893]  ? call_usermodehelper_exec_async+0x12a/0x160
      [86220.126926]  ret_from_fork+0x4b/0x60
      [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420
      [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
      [86220.127082] 4 locks held by kworker/0:2/1003:
      [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680
      [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20
      [86220.127300]  #3:  (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
      [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
      [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
      [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
      [86220.127568] Call Trace:
      [86220.127591]  dump_stack+0x70/0x9e
      [86220.127616]  ___might_sleep+0x14d/0x240
      [86220.127644]  __might_sleep+0x45/0x80
      [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
      [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
      [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
      [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
      [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
      [86220.128032]  process_one_work+0x258/0x680
      [86220.128063]  worker_thread+0x35/0x3f0
      [86220.128091]  kthread+0x124/0x140
      [86220.128117]  ? process_one_work+0x680/0x680
      [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
      [86220.128180]  ? umh_complete+0x40/0x40
      [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
      [86220.128243]  ret_from_fork+0x4b/0x60
      
      Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e64e469b
    • W
      ipv6: change route cache aging logic · 31afeb42
      Wei Wang 提交于
      In current route cache aging logic, if a route has both RTF_EXPIRE and
      RTF_GATEWAY set, the route will only be removed if the neighbor cache
      has no NTF_ROUTER flag. Otherwise, even if the route has expired, it
      won't get deleted.
      Fix this logic to always check if the route has expired first and then
      do the gateway neighbor cache check if previous check decide to not
      remove the exception entry.
      
      Fixes: 1859bac0 ("ipv6: remove from fib tree aged out RTF_CACHE dst")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31afeb42
    • D
      net: ipv6: send unsolicited NA after DAD · c76fe2d9
      David Ahern 提交于
      Unsolicited IPv6 neighbor advertisements should be sent after DAD
      completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
      addresses and have those sent by addrconf_dad_completed after DAD.
      
      Fixes: 4a6e3c5d ("net: ipv6: send unsolicited NA on admin up")
      Reported-by: NVivek Venkatraman <vivek@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c76fe2d9
    • M
      ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only · 7ece54a6
      Martin KaFai Lau 提交于
      If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
      implicitly implies it is an ipv6only socket.  However, in inet6_bind(),
      this addr_type checking and setting sk->sk_ipv6only to 1 are only done
      after sk->sk_prot->get_port(sk, snum) has been completed successfully.
      
      This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
      the 'get_port()'.
      
      In particular, when binding SO_REUSEPORT UDP sockets,
      udp_reuseport_add_sock(sk,...) is called.  udp_reuseport_add_sock()
      checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
      sk2->sk_reuseport_cb.  In this case, ipv6_only_sock(sk2) could be
      1 while ipv6_only_sock(sk) is still 0 here.  The end result is,
      reuseport_alloc(sk) is called instead of adding sk to the existing
      sk2->sk_reuseport_cb.
      
      It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
      IPv6 address (!ANY and !MAPPED).  Only one of the socket will
      receive packet.
      
      The fix is to set the implicit sk_ipv6only before calling get_port().
      The original sk_ipv6only has to be saved such that it can be restored
      in case get_port() failed.  The situation is similar to the
      inet_reset_saddr(sk) after get_port() has failed.
      
      Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
      reproduction which leads to a fix.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ece54a6
  3. 26 1月, 2018 6 次提交
  4. 24 1月, 2018 1 次提交
  5. 23 1月, 2018 2 次提交
  6. 20 1月, 2018 2 次提交
  7. 19 1月, 2018 2 次提交
    • W
      ipv6: don't let tb6_root node share routes with other node · 591ff9ea
      Wei Wang 提交于
      After commit 4512c43e, if we add a route to the subtree of tb6_root
      which does not have any route attached to it yet, the current code will
      let tb6_root and the node in the subtree share the same route.
      This could cause problem cause tb6_root has RTN_INFO flag marked and the
      tree repair and clean up code will not work properly.
      This commit makes sure tb6_root->leaf points back to null_entry instead
      of sharing route with other node.
      
      It fixes the following syzkaller reported issue:
      BUG: KASAN: use-after-free in ipv6_prefix_equal include/net/ipv6.h:540 [inline]
      BUG: KASAN: use-after-free in fib6_add_1+0x165f/0x1790 net/ipv6/ip6_fib.c:618
      Read of size 8 at addr ffff8801bc043498 by task syz-executor5/19819
      
      CPU: 1 PID: 19819 Comm: syz-executor5 Not tainted 4.15.0-rc7+ #186
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       ipv6_prefix_equal include/net/ipv6.h:540 [inline]
       fib6_add_1+0x165f/0x1790 net/ipv6/ip6_fib.c:618
       fib6_add+0x5fa/0x1540 net/ipv6/ip6_fib.c:1214
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1003
       ip6_route_add+0x141/0x190 net/ipv6/route.c:2790
       ipv6_route_ioctl+0x4db/0x6b0 net/ipv6/route.c:3299
       inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:520
       sock_do_ioctl+0x65/0xb0 net/socket.c:958
       sock_ioctl+0x2c2/0x440 net/socket.c:1055
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       entry_SYSCALL_64_fastpath+0x23/0x9a
      RIP: 0033:0x452ac9
      RSP: 002b:00007fd42b321c58 EFLAGS: 00000212 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 000000000071bea0 RCX: 0000000000452ac9
      RDX: 0000000020fd7000 RSI: 000000000000890b RDI: 0000000000000013
      RBP: 000000000000049e R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006f4f70
      R13: 00000000ffffffff R14: 00007fd42b3226d4 R15: 0000000000000000
      
      Fixes: 4512c43e ("ipv6: remove null_entry before adding default route")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      591ff9ea
    • A
      ip6_gre: init dev->mtu and dev->hard_header_len correctly · 128bb975
      Alexey Kodanev 提交于
      Commit b05229f4 ("gre6: Cleanup GREv6 transmit path,
      call common GRE functions") moved dev->mtu initialization
      from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
      result, the previously set values, before ndo_init(), are
      reset in the following cases:
      
      * rtnl_create_link() can update dev->mtu from IFLA_MTU
        parameter.
      
      * ip6gre_tnl_link_config() is invoked before ndo_init() in
        netlink and ioctl setup, so ndo_init() can reset MTU
        adjustments with the lower device MTU as well, dev->mtu
        and dev->hard_header_len.
      
        Not applicable for ip6gretap because it has one more call
        to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().
      
      Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
      parameter if a user sets it manually on a device creation,
      and fix the second one by moving ip6gre_tnl_link_config()
      call after register_netdevice().
      
      Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
      Fixes: db2ec95d ("ip6_gre: Fix MTU setting")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128bb975
  8. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  9. 16 1月, 2018 6 次提交
    • A
      netfilter: nf_defrag: move NF_CONNTRACK bits into #ifdef · 41e4b391
      Arnd Bergmann 提交于
      We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is
      disabled:
      
      net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
      net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct'
      net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
      net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct'
      
      Both functions already have an #ifdef for this, so let's move the
      check in there.
      
      Fixes: 902d6a4c ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      41e4b391
    • A
      netfilter: nf_defrag: mark xt_table structures 'const' again · b069b37a
      Arnd Bergmann 提交于
      As a side-effect of adding the module option, we now get a section
      mismatch warning:
      
      WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in reference from the variable packet_raw to the function .init.text:iptable_raw_table_init()
      The variable packet_raw references
      the function __init iptable_raw_table_init()
      If the reference is valid then annotate the
      variable with __init* or __refdata (see linux/init.h) or name the variable:
      *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
      
      Apparently it's ok to link to a __net_init function from .rodata but not
      from .data. We can address this by rearranging the logic so that the
      structure is read-only again. Instead of writing to the .priority field
      later, we have an extra copies of the structure with that flag. An added
      advantage is that that we don't have writable function pointers with this
      approach.
      
      Fixes: 902d6a4c ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b069b37a
    • S
      netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460 · 83f1999c
      Subash Abhinov Kasiviswanathan 提交于
      ipv6_defrag pulls network headers before fragment header. In case of
      an error, the netfilter layer is currently dropping these packets.
      This results in failure of some IPv6 standards tests which passed on
      older kernels due to the netfilter framework using cloning.
      
      The test case run here is a check for ICMPv6 error message replies
      when some invalid IPv6 fragments are sent. This specific test case is
      listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf
      in the Extension Header Processing Order section.
      
      A packet with unrecognized option Type 11 is sent and the test expects
      an ICMP error in line with RFC2460 section 4.2 -
      
      11 - discard the packet and, only if the packet's Destination
           Address was not a multicast address, send an ICMP Parameter
           Problem, Code 2, message to the packet's Source Address,
           pointing to the unrecognized Option Type.
      
      Since netfilter layer now drops all invalid IPv6 frag packets, we no
      longer see the ICMP error message and fail the test case.
      
      To fix this, save the transport header. If defrag is unable to process
      the packet due to RFC2460, restore the transport header and allow packet
      to be processed by stack. There is no change for other packet
      processing paths.
      
      Tested by confirming that stack sends an ICMP error when it receives
      these packets. Also tested that fragmented ICMP pings succeed.
      
      v1->v2: Instead of cloning always, save the transport_header and
      restore it in case of this specific error. Update the title and
      commit message accordingly.
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      83f1999c
    • I
      ipv6: Fix build with gcc-4.4.5 · 6802f3ad
      Ido Schimmel 提交于
      Emil reported the following compiler errors:
      
      net/ipv6/route.c: In function `rt6_sync_up`:
      net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer
      net/ipv6/route.c:3586: warning: missing braces around initializer
      net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`)
      net/ipv6/route.c: In function `rt6_sync_down_dev`:
      net/ipv6/route.c:3695: error: unknown field `event` specified in initializer
      net/ipv6/route.c:3695: warning: missing braces around initializer
      net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`)
      
      Problem is with the named initializers for the anonymous union members.
      Fix this by adding curly braces around the initialization.
      
      Fixes: 4c981e28 ("ipv6: Prepare to handle multiple netdev events")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NEmil S Tantilov <emils.tantilov@gmail.com>
      Tested-by: NEmil S Tantilov <emils.tantilov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6802f3ad
    • E
      ipv6: ip6_make_skb() needs to clear cork.base.dst · 95ef498d
      Eric Dumazet 提交于
      In my last patch, I missed fact that cork.base.dst was not initialized
      in ip6_make_skb() :
      
      If ip6_setup_cork() returns an error, we might attempt a dst_release()
      on some random pointer.
      
      Fixes: 862c03ee ("ipv6: fix possible mem leaks in ipv6_make_skb()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ef498d
    • M
      ipv6: fix udpv6 sendmsg crash caused by too small MTU · 749439bf
      Mike Maloney 提交于
      The logic in __ip6_append_data() assumes that the MTU is at least large
      enough for the headers.  A device's MTU may be adjusted after being
      added while sendmsg() is processing data, resulting in
      __ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
      the fragmentation header, the math results in a negative 'maxfraglen',
      which causes problems when refragmenting any previous skb in the
      skb_write_queue, leaving it possibly malformed.
      
      Instead sendmsg returns EINVAL when the mtu is calculated to be less
      than IPV6_MIN_MTU.
      
      Found by syzkaller:
      kernel BUG at ./include/linux/skbuff.h:2064!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
      RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
      RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
      RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
      RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
      RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
      RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
      R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
      R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
      FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ip6_finish_skb include/net/ipv6.h:911 [inline]
       udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
       udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x352/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
      RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
      RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
      R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
      R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
      Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
      RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
      RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMike Maloney <maloney@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      749439bf
  10. 11 1月, 2018 8 次提交
    • S
      netfilter: nf_defrag: Skip defrag if NOTRACK is set · 902d6a4c
      Subash Abhinov Kasiviswanathan 提交于
      conntrack defrag is needed only if some module like CONNTRACK or NAT
      explicitly requests it. For plain forwarding scenarios, defrag is
      not needed and can be skipped if NOTRACK is set in a rule.
      
      Since conntrack defrag is currently higher priority than raw table,
      setting NOTRACK is not sufficient. We need to move raw to a higher
      priority for iptables only.
      
      This is achieved by introducing a module parameter "raw_before_defrag"
      which allows to change the priority of raw table to place it before
      defrag. By default, the parameter is disabled and the priority of raw
      table is NF_IP_PRI_RAW to support legacy behavior. If the module
      parameter is enabled, then the priority of the raw table is set to
      NF_IP_PRI_RAW_BEFORE_DEFRAG.
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      902d6a4c
    • M
      ipv6: sr: fix TLVs not being copied using setsockopt · ccc12b11
      Mathieu Xhonneux 提交于
      Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
      to a socket through setsockopt, but the current implementation doesn't
      copy possible TLVs at the end of the SRH received from userspace.
      
      Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr))
      { ... } will never complete since the len and type fields of a possible
      HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error,
      and the HMAC will not be computed.
      
      This commit adds a memcpy in case TLVs have been appended to the SRH.
      
      Fixes: a149e7c7 ("ipv6: sr: add support for SRH injection through setsockopt")
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc12b11
    • E
      ipv6: fix possible mem leaks in ipv6_make_skb() · 862c03ee
      Eric Dumazet 提交于
      ip6_setup_cork() might return an error, while memory allocations have
      been done and must be rolled back.
      
      Fixes: 6422398c ("ipv6: introduce ipv6_make_skb")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Reported-by: NMike Maloney <maloney@google.com>
      Acked-by: NMike Maloney <maloney@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      862c03ee
    • I
      ipv6: Add support for non-equal-cost multipath · 398958ae
      Ido Schimmel 提交于
      The use of hash-threshold instead of modulo-N makes it trivial to add
      support for non-equal-cost multipath.
      
      Instead of dividing the multipath hash function's output space equally
      between the nexthops, each nexthop is assigned a region size which is
      proportional to its weight.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      398958ae
    • I
      ipv6: Use hash-threshold instead of modulo-N · 3d709f69
      Ido Schimmel 提交于
      Now that each nexthop stores its region boundary in the multipath hash
      function's output space, we can use hash-threshold instead of modulo-N
      in multipath selection.
      
      This reduces the number of checks we need to perform during lookup, as
      dead and linkdown nexthops are assigned a negative region boundary. In
      addition, in contrast to modulo-N, only flows near region boundaries are
      affected when a nexthop is added or removed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d709f69
    • I
      ipv6: Use a 31-bit multipath hash · 7696c06a
      Ido Schimmel 提交于
      The hash thresholds assigned to IPv6 nexthops are in the range of
      [-1, 2^31 - 1], where a negative value is assigned to nexthops that
      should not be considered during multipath selection.
      
      Therefore, in a similar fashion to IPv4, we need to use the upper
      31-bits of the multipath hash for multipath selection.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7696c06a
    • I
      ipv6: Calculate hash thresholds for IPv6 nexthops · d7dedee1
      Ido Schimmel 提交于
      Before we convert IPv6 to use hash-threshold instead of modulo-N, we
      first need each nexthop to store its region boundary in the hash
      function's output space.
      
      The boundary is calculated by dividing the output space equally between
      the different active nexthops. That is, nexthops that are not dead or
      linkdown.
      
      The boundaries are rebalanced whenever a nexthop is added or removed to
      a multipath route and whenever a nexthop becomes active or inactive.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7dedee1
    • A
      netfilter: improve flow table Kconfig dependencies · a0a97f2a
      Arnd Bergmann 提交于
      The newly added NF_FLOW_TABLE options cause some build failures in
      randconfig kernels:
      
      - when CONFIG_NF_CONNTRACK is disabled, or is a loadable module but
        NF_FLOW_TABLE is built-in:
      
        In file included from net/netfilter/nf_flow_table.c:8:0:
        include/net/netfilter/nf_conntrack.h:59:22: error: field 'ct_general' has incomplete type
          struct nf_conntrack ct_general;
        include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
        include/net/netfilter/nf_conntrack.h:148:15: error: 'const struct sk_buff' has no member named '_nfct'
        include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
        include/net/netfilter/nf_conntrack.h:157:2: error: implicit declaration of function 'nf_conntrack_put'; did you mean 'nf_ct_put'? [-Werror=implicit-function-declaration]
      
        net/netfilter/nf_flow_table.o: In function `nf_flow_offload_work_gc':
        (.text+0x1540): undefined reference to `nf_ct_delete'
      
      - when CONFIG_NF_TABLES is disabled:
      
        In file included from net/ipv6/netfilter/nf_flow_table_ipv6.c:13:0:
        include/net/netfilter/nf_tables.h: In function 'nft_gencursor_next':
        include/net/netfilter/nf_tables.h:1189:14: error: 'const struct net' has no member named 'nft'; did you mean 'nf'?
      
       - when CONFIG_NF_FLOW_TABLE_INET is enabled, but NF_FLOW_TABLE_IPV4
        or NF_FLOW_TABLE_IPV6 are not, or are loadable modules
      
        net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
        nf_flow_table_inet.c:(.text+0x94): undefined reference to `nf_flow_offload_ipv6_hook'
        nf_flow_table_inet.c:(.text+0x40): undefined reference to `nf_flow_offload_ip_hook'
      
      - when CONFIG_NF_FLOW_TABLES is disabled, but the other options are
        enabled:
      
        net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
        nf_flow_table_inet.c:(.text+0x6c): undefined reference to `nf_flow_offload_ipv6_hook'
        net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_exit':
        nf_flow_table_inet.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
        net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_init':
        nf_flow_table_inet.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'
        net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_exit':
        nf_flow_table_ipv4.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
        net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_init':
        nf_flow_table_ipv4.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'
      
      This adds additional Kconfig dependencies to ensure that NF_CONNTRACK and NF_TABLES
      are always visible from NF_FLOW_TABLE, and that the internal dependencies between
      the four new modules are met.
      
      Fixes: 7c23b629 ("netfilter: flow table support for the mixed IPv4/IPv6 family")
      Fixes: 09952107 ("netfilter: flow table support for IPv6")
      Fixes: 97add9f0 ("netfilter: flow table support for IPv4")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a0a97f2a
  11. 10 1月, 2018 6 次提交
    • A
      netfilter: add IPv6 segment routing header 'srh' match · 202a8ff5
      Ahmed Abdelsalam 提交于
      It allows matching packets based on Segment Routing Header
      (SRH) information.
      The implementation considers revision 7 of the SRH draft.
      https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
      
      Currently supported match options include:
      (1) Next Header
      (2) Hdr Ext Len
      (3) Segments Left
      (4) Last Entry
      (5) Tag value of SRH
      Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      202a8ff5
    • W
      netfilter: remove duplicated include · 99eadf67
      Wei Yongjun 提交于
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99eadf67
    • P
      netfilter: nf_tables: get rid of struct nft_af_info abstraction · 98319cb9
      Pablo Neira Ayuso 提交于
      Remove the infrastructure to register/unregister nft_af_info structure,
      this structure stores no useful information anymore.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98319cb9
    • P
      netfilter: nf_tables: get rid of pernet families · dd4cbef7
      Pablo Neira Ayuso 提交于
      Now that we have a single table list for each netns, we can get rid of
      one pointer per family and the global afinfo list, thus, shrinking
      struct netns for nftables that now becomes 64 bytes smaller.
      
      And call __nft_release_afinfo() from __net_exit path accordingly to
      release netnamespace objects on removal.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dd4cbef7
    • P
      netfilter: nf_tables: remove nhooks field from struct nft_af_info · fe19c04c
      Pablo Neira Ayuso 提交于
      We already validate the hook through bitmask, so this check is
      superfluous. When removing this, this patch is also fixing a bug in the
      new flowtable codebase, since ctx->afi points to the table family
      instead of the netdev family which is where the flowtable is really
      hooked in.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fe19c04c
    • W
      ipv6: remove null_entry before adding default route · 4512c43e
      Wei Wang 提交于
      In the current code, when creating a new fib6 table, tb6_root.leaf gets
      initialized to net->ipv6.ip6_null_entry.
      If a default route is being added with rt->rt6i_metric = 0xffffffff,
      fib6_add() will add this route after net->ipv6.ip6_null_entry. As
      null_entry is shared, it could cause problem.
      
      In order to fix it, set fn->leaf to NULL before calling
      fib6_add_rt2node() when trying to add the first default route.
      And reset fn->leaf to null_entry when adding fails or when deleting the
      last default route.
      
      syzkaller reported the following issue which is fixed by this commit:
      
      WARNING: suspicious RCU usage
      4.15.0-rc5+ #171 Not tainted
      -----------------------------
      net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      4 locks held by swapper/0/0:
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007
       #2:  (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
       fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701
       fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892
       fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815
       fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863
       fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933
       __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949
       fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline]
       fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016
       fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033
       call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
       expire_timers kernel/time/timer.c:1357 [inline]
       __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
       run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
       __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:540 [inline]
       smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
       </IRQ>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4512c43e