1. 27 6月, 2018 1 次提交
  2. 16 6月, 2018 1 次提交
  3. 16 5月, 2018 1 次提交
  4. 26 4月, 2018 1 次提交
  5. 13 4月, 2018 2 次提交
    • W
      net: fix deadlock while clearing neighbor proxy table · 53b76cdf
      Wolfgang Bumiller 提交于
      When coming from ndisc_netdev_event() in net/ipv6/ndisc.c,
      neigh_ifdown() is called with &nd_tbl, locking this while
      clearing the proxy neighbor entries when eg. deleting an
      interface. Calling the table's pndisc_destructor() with the
      lock still held, however, can cause a deadlock: When a
      multicast listener is available an IGMP packet of type
      ICMPV6_MGM_REDUCTION may be sent out. When reaching
      ip6_finish_output2(), if no neighbor entry for the target
      address is found, __neigh_create() is called with &nd_tbl,
      which it'll want to lock.
      
      Move the elements into their own list, then unlock the table
      and perform the destruction.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Signed-off-by: NWolfgang Bumiller <w.bumiller@proxmox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53b76cdf
    • E
      net: validate attribute sizes in neigh_dump_table() · 7dd07c14
      Eric Dumazet 提交于
      Since neigh_dump_table() calls nlmsg_parse() without giving policy
      constraints, attributes can have arbirary size that we must validate
      
      Reported by syzbot/KMSAN :
      
      BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
      CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       neigh_master_filtered net/core/neighbour.c:2292 [inline]
       neigh_dump_table net/core/neighbour.c:2348 [inline]
       neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
       netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225
       __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322
       netlink_dump_start include/linux/netlink.h:214 [inline]
       rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598
       netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653
       netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
       netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337
       netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43fed9
      RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9
      RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800
      R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline]
       netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 21fdd092 ("net: Add support for filtering neigh dump by master device")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dd07c14
  6. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  7. 16 1月, 2018 1 次提交
  8. 22 11月, 2017 1 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
  9. 26 9月, 2017 1 次提交
  10. 10 8月, 2017 1 次提交
  11. 01 7月, 2017 2 次提交
  12. 20 6月, 2017 1 次提交
  13. 05 6月, 2017 1 次提交
    • S
      neigh: Really delete an arp/neigh entry on "ip neigh delete" or "arp -d" · 5071034e
      Sowmini Varadhan 提交于
      The command
        # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
      adds an entry like the following (listed by "arp -an")
        ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
      but the symmetric deletion command
        # arp -i eth2 -d 62.2.0.1
      does not remove the PERM entry from the table, and instead leaves behind
        ? (62.2.0.1) at <incomplete> on eth2
      
      The reason is that there is a refcnt of 1 for the arp_tbl itself
      (neigh_alloc starts off the entry with a refcnt of 1), thus
      the neigh_release() call from arp_invalidate() will (at best) just
      decrement the ref to 1, but will never actually free it from the
      table.
      
      To fix this, we need to do something like neigh_forced_gc: if
      the refcnt is 1 (i.e., on the table's ref), remove the entry from
      the table and free it. This patch refactors and shares common code
      between neigh_forced_gc and the newly added neigh_remove_one.
      
      A similar issue exists for IPv6 Neighbor Cache entries, and is fixed
      in a similar manner by this patch.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5071034e
  14. 17 5月, 2017 1 次提交
    • I
      neighbour: update neigh timestamps iff update is effective · 77d71233
      Ihar Hrachyshka 提交于
      It's a common practice to send gratuitous ARPs after moving an
      IP address to another device to speed up healing of a service. To
      fulfill service availability constraints, the timing of network peers
      updating their caches to point to a new location of an IP address can be
      particularly important.
      
      Sometimes neigh_update calls won't touch neither lladdr nor state, for
      example if an update arrives in locktime interval. The neigh->updated
      value is tested by the protocol specific neigh code, which in turn
      will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
      call to neigh_update() or not. As a result, we may effectively ignore
      the update request, bailing out of touching the neigh entry, except that
      we still bump its timestamps inside neigh_update.
      
      This may be a problem for updates arriving in quick succession. For
      example, consider the following scenario:
      
      A service is moved to another device with its IP address. The new device
      sends three gratuitous ARP requests into the network with ~1 seconds
      interval between them. Just before the first request arrives to one of
      network peer nodes, its neigh entry for the IP address transitions from
      STALE to DELAY.  This transition, among other things, updates
      neigh->updated. Once the kernel receives the first gratuitous ARP, it
      ignores it because its arrival time is inside the locktime interval. The
      kernel still bumps neigh->updated. Then the second gratuitous ARP
      request arrives, and it's also ignored because it's still in the (new)
      locktime interval. Same happens for the third request. The node
      eventually heals itself (after delay_first_probe_time seconds since the
      initial transition to DELAY state), but it just wasted some time and
      require a new ARP request/reply round trip. This unfortunate behaviour
      both puts more load on the network, as well as reduces service
      availability.
      
      This patch changes neigh_update so that it bumps neigh->updated (as well
      as neigh->confirmed) only once we are sure that either lladdr or entry
      state will change). In the scenario described above, it means that the
      second gratuitous ARP request will actually update the entry lladdr.
      
      Ideally, we would update the neigh entry on the very first gratuitous
      ARP request. The locktime mechanism is designed to ignore ARP updates in
      a short timeframe after a previous ARP update was honoured by the kernel
      layer. This would require tracking timestamps for state transitions
      separately from timestamps when actual updates are received. This would
      probably involve changes in neighbour struct. Therefore, the patch
      doesn't tackle the issue of the first gratuitous APR ignored, leaving
      it for a follow-up.
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77d71233
  15. 18 4月, 2017 1 次提交
  16. 14 4月, 2017 1 次提交
  17. 24 3月, 2017 1 次提交
  18. 23 3月, 2017 1 次提交
  19. 16 2月, 2017 1 次提交
    • M
      net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification · 7627ae60
      Marcus Huewe 提交于
      When setting a neigh related sysctl parameter, we always send a
      NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
      executing
      
      	sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
      
      a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
      
      This is caused by commit 2a4501ae ("neigh: Send a
      notification when DELAY_PROBE_TIME changes"). According to the
      commit's description, it was intended to generate such an event
      when setting the "delay_first_probe_time" sysctl parameter.
      
      In order to fix this, only generate this event when actually
      setting the "delay_first_probe_time" sysctl parameter. This fix
      should not have any unintended side-effects, because all but one
      registered netevent callbacks check for other netevent event
      types (the registered callbacks were obtained by grepping for
      "register_netevent_notifier"). The only callback that uses the
      NETEVENT_DELAY_PROBE_TIME_UPDATE event is
      mlxsw_sp_router_netevent_event() (in
      drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
      of this event, it only accesses the DELAY_PROBE_TIME of the
      passed neigh_parms.
      
      Fixes: 2a4501ae ("neigh: Send a notification when DELAY_PROBE_TIME changes")
      Signed-off-by: NMarcus Huewe <suse-tux@gmx.de>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7627ae60
  20. 24 12月, 2016 1 次提交
    • I
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel 提交于
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53f800e3
  21. 01 12月, 2016 1 次提交
    • Z
      neigh: remove duplicate check for same neigh · 18502acd
      Zhang Shengju 提交于
      Currently loop index 'idx' is used as the index in the neigh list of interest.
      It's increased only when the neigh is dumped. It's not the absolute index in
      the list. Because there is no info to record which neigh has already be scanned
      by previous loop. This will cause the filtered out neighs to be scanned mulitple
      times.
      
      This patch make idx as the absolute index in the list, it will increase no matter
      whether the neigh is filtered. This will prevent the above problem.
      
      And this is in line with other dump functions.
      
      v2:
       - take David Ahern's advice to do simple change
      Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18502acd
  22. 09 8月, 2016 1 次提交
    • J
      neigh: allow admin to set NUD_STALE · 0e7bbcc1
      Julian Anastasov 提交于
      Admin should be able to set any state. Currently, this fails
      when lladdr is not changed and state is changed from
      NUD_CONNECTED to NUD_STALE:
      
      ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      
      Problem may be from 2.1.X days.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e7bbcc1
  23. 27 7月, 2016 1 次提交
    • H
      net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() · d1c2b501
      He Chunhui 提交于
      NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
      neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
      the entry state should not be changed.
      
      Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
      was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
      state can be changed to NUD_STALE.
      
      This may cause problem. Because NUD_STALE lladdr doesn't guarantee
      reachability, when we send traffic, the state will be changed to
      NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
      we will change the state to NUD_PROBE and send probe traffic. But now the
      state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
      so the probe traffic will not be sent. This situation may happen again and
      again, and packets will be sent to an non-reachable lladdr forever.
      
      The fix is to remove the "NUD_CONNECTED" condition. After that the
      "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
      be redundant, so remove it.
      
      This change may increase probe traffic, but it's essential since NUD_STALE
      lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
      when we can't get confirmation, even while remote packets try to set
      NUD_STALE state.
      Signed-off-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c2b501
  24. 06 7月, 2016 2 次提交
    • I
      neigh: Send a notification when DELAY_PROBE_TIME changes · 2a4501ae
      Ido Schimmel 提交于
      When the data plane is offloaded the traffic doesn't go through the
      networking stack. Therefore, after first resolving a neighbour the NUD
      state machine will transition it from REACHABLE to STALE until it's
      finally deleted by the garbage collector.
      
      To prevent such situations the offloading driver should notify the NUD
      state machine on any neighbours that were recently used. The driver's
      polling interval should be set so that the NUD state machine can
      function as if the traffic wasn't offloaded.
      
      Currently, there are no in-tree drivers that can report confirmation for
      a neighbour, but only 'used' indication. Therefore, the polling interval
      should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will
      transition from REACHABLE state to DELAY (instead of STALE) if "a packet
      was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861).
      
      Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via
      netlink or sysctl - so that offloading drivers can correctly set their
      polling interval.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a4501ae
    • J
      net: add dev arg to ndo_neigh_construct/destroy · 503eebc2
      Jiri Pirko 提交于
      As the following patch will allow upper devices to follow the call down
      lower devices, we need to add dev here and not rely on n->dev.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      503eebc2
  25. 29 6月, 2016 1 次提交
    • D
      neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() · b560f03d
      David Barroso 提交于
      neigh_xmit() expects to be called inside an RCU-bh read side critical
      section, and while one of its two current callers gets this right, the
      other one doesn't.
      
      More specifically, neigh_xmit() has two callers, mpls_forward() and
      mpls_output(), and while both callers call neigh_xmit() under
      rcu_read_lock(), this provides sufficient protection for neigh_xmit()
      only in the case of mpls_forward(), as that is always called from
      softirq context and therefore doesn't need explicit BH protection,
      while mpls_output() can be called from process context with softirqs
      enabled.
      
      When mpls_output() is called from process context, with softirqs
      enabled, we can be preempted by a softirq at any time, and RCU-bh
      considers the completion of a softirq as signaling the end of any
      pending read-side critical sections, so if we do get a softirq
      while we are in the part of neigh_xmit() that expects to be run inside
      an RCU-bh read side critical section, we can end up with an unexpected
      RCU grace period running right in the middle of that critical section,
      making things go boom.
      
      This patch fixes this impedance mismatch in the callee, by making
      neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
      expects to be treated as an RCU-bh read side critical section, as this
      seems a safer option than fixing it in the callers.
      
      Fixes: 4fd3d7d9 ("neigh: Add helper function neigh_xmit")
      Signed-off-by: NDavid Barroso <dbarroso@fastly.com>
      Signed-off-by: NLennert Buytenhek <lbuytenhek@fastly.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b560f03d
  26. 27 4月, 2016 1 次提交
  27. 24 4月, 2016 1 次提交
  28. 03 12月, 2015 1 次提交
  29. 18 11月, 2015 1 次提交
  30. 07 10月, 2015 1 次提交
  31. 30 9月, 2015 1 次提交
  32. 11 8月, 2015 1 次提交
    • R
      net: add explicit logging and stat for neighbour table overflow · fb811395
      Rick Jones 提交于
      Add an explicit neighbour table overflow message (ratelimited) and
      statistic to make diagnosing neighbour table overflows tractable in
      the wild.
      
      Diagnosing a neighbour table overflow can be quite difficult in the wild
      because there is no explicit dmesg logged.  Callers to neighbour code
      seem to use net_dbg_ratelimit when the neighbour call fails which means
      the "base message" is not emitted and the callback suppressed messages
      from the ratelimiting can end-up juxtaposed with unrelated messages.
      Further, a forced garbage collection will increment a stat on each call
      whether it was successful in freeing-up a table entry or not, so that
      statistic is only a hint.  So, add a net_info_ratelimited message and
      explicit statistic to the neighbour code.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb811395
  33. 22 6月, 2015 1 次提交
    • J
      neigh: do not modify unlinked entries · 2c51a97f
      Julian Anastasov 提交于
      The lockless lookups can return entry that is unlinked.
      Sometimes they get reference before last neigh_cleanup_and_release,
      sometimes they do not need reference. Later, any
      modification attempts may result in the following problems:
      
      1. entry is not destroyed immediately because neigh_update
      can start the timer for dead entry, eg. on change to NUD_REACHABLE
      state. As result, entry lives for some time but is invisible
      and out of control.
      
      2. __neigh_event_send can run in parallel with neigh_destroy
      while refcnt=0 but if timer is started and expired refcnt can
      reach 0 for second time leading to second neigh_destroy and
      possible crash.
      
      Thanks to Eric Dumazet and Ying Xue for their work and analyze
      on the __neigh_event_send change.
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Fixes: a263b309 ("ipv4: Make neigh lookups directly in output packet path.")
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c51a97f
  34. 22 5月, 2015 1 次提交
    • E
      neigh: Better handling of transition to NUD_PROBE state · 765c9c63
      Erik Kline 提交于
      [1] When entering NUD_PROBE state via neigh_update(), perhaps received
          from userspace, correctly (re)initialize the probes count to zero.
      
          This is useful for forcing revalidation of a neighbor (for example
          if the host is attempting to do DNA [IPv4 4436, IPv6 6059]).
      
      [2] Notify listeners when a neighbor goes into NUD_PROBE state.
      
          By sending notifications on entry to NUD_PROBE state listeners get
          more timely warnings of imminent connectivity issues.
      
          The current notifications on entry to NUD_STALE have somewhat
          limited usefulness: NUD_STALE is a perfectly normal state, as is
          NUD_DELAY, whereas notifications on entry to NUD_FAILURE come after
          a neighbor reachability problem has been confirmed (typically after
          three probes).
      Signed-off-by: NErik Kline <ek@google.com>
      Acked-By: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765c9c63
  35. 21 3月, 2015 1 次提交
  36. 13 3月, 2015 1 次提交
  37. 09 3月, 2015 1 次提交