1. 01 7月, 2017 2 次提交
  2. 05 6月, 2017 1 次提交
    • S
      neigh: Really delete an arp/neigh entry on "ip neigh delete" or "arp -d" · 5071034e
      Sowmini Varadhan 提交于
      The command
        # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
      adds an entry like the following (listed by "arp -an")
        ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
      but the symmetric deletion command
        # arp -i eth2 -d 62.2.0.1
      does not remove the PERM entry from the table, and instead leaves behind
        ? (62.2.0.1) at <incomplete> on eth2
      
      The reason is that there is a refcnt of 1 for the arp_tbl itself
      (neigh_alloc starts off the entry with a refcnt of 1), thus
      the neigh_release() call from arp_invalidate() will (at best) just
      decrement the ref to 1, but will never actually free it from the
      table.
      
      To fix this, we need to do something like neigh_forced_gc: if
      the refcnt is 1 (i.e., on the table's ref), remove the entry from
      the table and free it. This patch refactors and shares common code
      between neigh_forced_gc and the newly added neigh_remove_one.
      
      A similar issue exists for IPv6 Neighbor Cache entries, and is fixed
      in a similar manner by this patch.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5071034e
  3. 17 5月, 2017 1 次提交
    • I
      neighbour: update neigh timestamps iff update is effective · 77d71233
      Ihar Hrachyshka 提交于
      It's a common practice to send gratuitous ARPs after moving an
      IP address to another device to speed up healing of a service. To
      fulfill service availability constraints, the timing of network peers
      updating their caches to point to a new location of an IP address can be
      particularly important.
      
      Sometimes neigh_update calls won't touch neither lladdr nor state, for
      example if an update arrives in locktime interval. The neigh->updated
      value is tested by the protocol specific neigh code, which in turn
      will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
      call to neigh_update() or not. As a result, we may effectively ignore
      the update request, bailing out of touching the neigh entry, except that
      we still bump its timestamps inside neigh_update.
      
      This may be a problem for updates arriving in quick succession. For
      example, consider the following scenario:
      
      A service is moved to another device with its IP address. The new device
      sends three gratuitous ARP requests into the network with ~1 seconds
      interval between them. Just before the first request arrives to one of
      network peer nodes, its neigh entry for the IP address transitions from
      STALE to DELAY.  This transition, among other things, updates
      neigh->updated. Once the kernel receives the first gratuitous ARP, it
      ignores it because its arrival time is inside the locktime interval. The
      kernel still bumps neigh->updated. Then the second gratuitous ARP
      request arrives, and it's also ignored because it's still in the (new)
      locktime interval. Same happens for the third request. The node
      eventually heals itself (after delay_first_probe_time seconds since the
      initial transition to DELAY state), but it just wasted some time and
      require a new ARP request/reply round trip. This unfortunate behaviour
      both puts more load on the network, as well as reduces service
      availability.
      
      This patch changes neigh_update so that it bumps neigh->updated (as well
      as neigh->confirmed) only once we are sure that either lladdr or entry
      state will change). In the scenario described above, it means that the
      second gratuitous ARP request will actually update the entry lladdr.
      
      Ideally, we would update the neigh entry on the very first gratuitous
      ARP request. The locktime mechanism is designed to ignore ARP updates in
      a short timeframe after a previous ARP update was honoured by the kernel
      layer. This would require tracking timestamps for state transitions
      separately from timestamps when actual updates are received. This would
      probably involve changes in neighbour struct. Therefore, the patch
      doesn't tackle the issue of the first gratuitous APR ignored, leaving
      it for a follow-up.
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77d71233
  4. 18 4月, 2017 1 次提交
  5. 14 4月, 2017 1 次提交
  6. 24 3月, 2017 1 次提交
  7. 23 3月, 2017 1 次提交
  8. 16 2月, 2017 1 次提交
    • M
      net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification · 7627ae60
      Marcus Huewe 提交于
      When setting a neigh related sysctl parameter, we always send a
      NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
      executing
      
      	sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
      
      a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
      
      This is caused by commit 2a4501ae ("neigh: Send a
      notification when DELAY_PROBE_TIME changes"). According to the
      commit's description, it was intended to generate such an event
      when setting the "delay_first_probe_time" sysctl parameter.
      
      In order to fix this, only generate this event when actually
      setting the "delay_first_probe_time" sysctl parameter. This fix
      should not have any unintended side-effects, because all but one
      registered netevent callbacks check for other netevent event
      types (the registered callbacks were obtained by grepping for
      "register_netevent_notifier"). The only callback that uses the
      NETEVENT_DELAY_PROBE_TIME_UPDATE event is
      mlxsw_sp_router_netevent_event() (in
      drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
      of this event, it only accesses the DELAY_PROBE_TIME of the
      passed neigh_parms.
      
      Fixes: 2a4501ae ("neigh: Send a notification when DELAY_PROBE_TIME changes")
      Signed-off-by: NMarcus Huewe <suse-tux@gmx.de>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7627ae60
  9. 24 12月, 2016 1 次提交
    • I
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel 提交于
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53f800e3
  10. 01 12月, 2016 1 次提交
    • Z
      neigh: remove duplicate check for same neigh · 18502acd
      Zhang Shengju 提交于
      Currently loop index 'idx' is used as the index in the neigh list of interest.
      It's increased only when the neigh is dumped. It's not the absolute index in
      the list. Because there is no info to record which neigh has already be scanned
      by previous loop. This will cause the filtered out neighs to be scanned mulitple
      times.
      
      This patch make idx as the absolute index in the list, it will increase no matter
      whether the neigh is filtered. This will prevent the above problem.
      
      And this is in line with other dump functions.
      
      v2:
       - take David Ahern's advice to do simple change
      Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18502acd
  11. 09 8月, 2016 1 次提交
    • J
      neigh: allow admin to set NUD_STALE · 0e7bbcc1
      Julian Anastasov 提交于
      Admin should be able to set any state. Currently, this fails
      when lladdr is not changed and state is changed from
      NUD_CONNECTED to NUD_STALE:
      
      ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      
      Problem may be from 2.1.X days.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e7bbcc1
  12. 27 7月, 2016 1 次提交
    • H
      net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() · d1c2b501
      He Chunhui 提交于
      NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
      neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
      the entry state should not be changed.
      
      Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
      was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
      state can be changed to NUD_STALE.
      
      This may cause problem. Because NUD_STALE lladdr doesn't guarantee
      reachability, when we send traffic, the state will be changed to
      NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
      we will change the state to NUD_PROBE and send probe traffic. But now the
      state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
      so the probe traffic will not be sent. This situation may happen again and
      again, and packets will be sent to an non-reachable lladdr forever.
      
      The fix is to remove the "NUD_CONNECTED" condition. After that the
      "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
      be redundant, so remove it.
      
      This change may increase probe traffic, but it's essential since NUD_STALE
      lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
      when we can't get confirmation, even while remote packets try to set
      NUD_STALE state.
      Signed-off-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c2b501
  13. 06 7月, 2016 2 次提交
    • I
      neigh: Send a notification when DELAY_PROBE_TIME changes · 2a4501ae
      Ido Schimmel 提交于
      When the data plane is offloaded the traffic doesn't go through the
      networking stack. Therefore, after first resolving a neighbour the NUD
      state machine will transition it from REACHABLE to STALE until it's
      finally deleted by the garbage collector.
      
      To prevent such situations the offloading driver should notify the NUD
      state machine on any neighbours that were recently used. The driver's
      polling interval should be set so that the NUD state machine can
      function as if the traffic wasn't offloaded.
      
      Currently, there are no in-tree drivers that can report confirmation for
      a neighbour, but only 'used' indication. Therefore, the polling interval
      should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will
      transition from REACHABLE state to DELAY (instead of STALE) if "a packet
      was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861).
      
      Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via
      netlink or sysctl - so that offloading drivers can correctly set their
      polling interval.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a4501ae
    • J
      net: add dev arg to ndo_neigh_construct/destroy · 503eebc2
      Jiri Pirko 提交于
      As the following patch will allow upper devices to follow the call down
      lower devices, we need to add dev here and not rely on n->dev.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      503eebc2
  14. 29 6月, 2016 1 次提交
    • D
      neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() · b560f03d
      David Barroso 提交于
      neigh_xmit() expects to be called inside an RCU-bh read side critical
      section, and while one of its two current callers gets this right, the
      other one doesn't.
      
      More specifically, neigh_xmit() has two callers, mpls_forward() and
      mpls_output(), and while both callers call neigh_xmit() under
      rcu_read_lock(), this provides sufficient protection for neigh_xmit()
      only in the case of mpls_forward(), as that is always called from
      softirq context and therefore doesn't need explicit BH protection,
      while mpls_output() can be called from process context with softirqs
      enabled.
      
      When mpls_output() is called from process context, with softirqs
      enabled, we can be preempted by a softirq at any time, and RCU-bh
      considers the completion of a softirq as signaling the end of any
      pending read-side critical sections, so if we do get a softirq
      while we are in the part of neigh_xmit() that expects to be run inside
      an RCU-bh read side critical section, we can end up with an unexpected
      RCU grace period running right in the middle of that critical section,
      making things go boom.
      
      This patch fixes this impedance mismatch in the callee, by making
      neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
      expects to be treated as an RCU-bh read side critical section, as this
      seems a safer option than fixing it in the callers.
      
      Fixes: 4fd3d7d9 ("neigh: Add helper function neigh_xmit")
      Signed-off-by: NDavid Barroso <dbarroso@fastly.com>
      Signed-off-by: NLennert Buytenhek <lbuytenhek@fastly.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b560f03d
  15. 27 4月, 2016 1 次提交
  16. 24 4月, 2016 1 次提交
  17. 03 12月, 2015 1 次提交
  18. 18 11月, 2015 1 次提交
  19. 07 10月, 2015 1 次提交
  20. 30 9月, 2015 1 次提交
  21. 11 8月, 2015 1 次提交
    • R
      net: add explicit logging and stat for neighbour table overflow · fb811395
      Rick Jones 提交于
      Add an explicit neighbour table overflow message (ratelimited) and
      statistic to make diagnosing neighbour table overflows tractable in
      the wild.
      
      Diagnosing a neighbour table overflow can be quite difficult in the wild
      because there is no explicit dmesg logged.  Callers to neighbour code
      seem to use net_dbg_ratelimit when the neighbour call fails which means
      the "base message" is not emitted and the callback suppressed messages
      from the ratelimiting can end-up juxtaposed with unrelated messages.
      Further, a forced garbage collection will increment a stat on each call
      whether it was successful in freeing-up a table entry or not, so that
      statistic is only a hint.  So, add a net_info_ratelimited message and
      explicit statistic to the neighbour code.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb811395
  22. 22 6月, 2015 1 次提交
    • J
      neigh: do not modify unlinked entries · 2c51a97f
      Julian Anastasov 提交于
      The lockless lookups can return entry that is unlinked.
      Sometimes they get reference before last neigh_cleanup_and_release,
      sometimes they do not need reference. Later, any
      modification attempts may result in the following problems:
      
      1. entry is not destroyed immediately because neigh_update
      can start the timer for dead entry, eg. on change to NUD_REACHABLE
      state. As result, entry lives for some time but is invisible
      and out of control.
      
      2. __neigh_event_send can run in parallel with neigh_destroy
      while refcnt=0 but if timer is started and expired refcnt can
      reach 0 for second time leading to second neigh_destroy and
      possible crash.
      
      Thanks to Eric Dumazet and Ying Xue for their work and analyze
      on the __neigh_event_send change.
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Fixes: a263b309 ("ipv4: Make neigh lookups directly in output packet path.")
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c51a97f
  23. 22 5月, 2015 1 次提交
    • E
      neigh: Better handling of transition to NUD_PROBE state · 765c9c63
      Erik Kline 提交于
      [1] When entering NUD_PROBE state via neigh_update(), perhaps received
          from userspace, correctly (re)initialize the probes count to zero.
      
          This is useful for forcing revalidation of a neighbor (for example
          if the host is attempting to do DNA [IPv4 4436, IPv6 6059]).
      
      [2] Notify listeners when a neighbor goes into NUD_PROBE state.
      
          By sending notifications on entry to NUD_PROBE state listeners get
          more timely warnings of imminent connectivity issues.
      
          The current notifications on entry to NUD_STALE have somewhat
          limited usefulness: NUD_STALE is a perfectly normal state, as is
          NUD_DELAY, whereas notifications on entry to NUD_FAILURE come after
          a neighbor reachability problem has been confirmed (typically after
          three probes).
      Signed-off-by: NErik Kline <ek@google.com>
      Acked-By: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765c9c63
  24. 21 3月, 2015 1 次提交
  25. 13 3月, 2015 1 次提交
  26. 09 3月, 2015 1 次提交
  27. 04 3月, 2015 2 次提交
    • E
      neigh: Add helper function neigh_xmit · 4fd3d7d9
      Eric W. Biederman 提交于
      For MPLS I am building the code so that either the neighbour mac
      address can be specified or we can have a next hop in ipv4 or ipv6.
      
      The kind of next hop we have is indicated by the neighbour table
      pointer.  A neighbour table pointer of NULL is a link layer address.
      A non-NULL neighbour table pointer indicates which neighbour table and
      thus which address family the next hop address is in that we need to
      look up.
      
      The code either sends a packet directly or looks up the appropriate
      neighbour table entry and sends the packet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fd3d7d9
    • E
      neigh: Factor out ___neigh_lookup_noref · 60395a20
      Eric W. Biederman 提交于
      While looking at the mpls code I found myself writing yet another
      version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
      and __ipv6_lookup_noref.
      
      So to make my work a little easier and to make it a smidge easier to
      verify/maintain the mpls code in the future I stopped and wrote
      ___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
      __ipv6_lookup_noref in terms of this new function.  I tested my new
      version by verifying that the same code is generated in
      ip_finish_output2 and ip6_finish_output2 where these functions are
      inlined.
      
      To get to ___neigh_lookup_noref I added a new neighbour cache table
      function key_eq.  So that the static size of the key would be
      available.
      
      I also added __neigh_lookup_noref for people who want to to lookup
      a neighbour table entry quickly but don't know which neibhgour table
      they are going to look up.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60395a20
  28. 03 3月, 2015 3 次提交
  29. 19 1月, 2015 1 次提交
  30. 18 1月, 2015 1 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
  31. 14 1月, 2015 1 次提交
    • J
      neighbour: fix base_reachable_time(_ms) not effective immediatly when changed · 4bf6980d
      Jean-Francois Remy 提交于
      When setting base_reachable_time or base_reachable_time_ms on a
      specific interface through sysctl or netlink, the reachable_time
      value is not updated.
      
      This means that neighbour entries will continue to be updated using the
      old value until it is recomputed in neigh_period_work (which
          recomputes the value every 300*HZ).
      On systems with HZ equal to 1000 for instance, it means 5mins before
      the change is effective.
      
      This patch changes this behavior by recomputing reachable_time after
      each set on base_reachable_time or base_reachable_time_ms.
      The new value will become effective the next time the neighbour's timer
      is triggered.
      
      Changes are made in two places: the netlink code for set and the sysctl
      handling code. For sysctl, I use a proc_handler. The ipv6 network
      code does provide its own handler but it already refreshes
      reachable_time correctly so it's not an issue.
      Any other user of neighbour which provide its own handlers must
      refresh reachable_time.
      Signed-off-by: NJean-Francois Remy <jeff@melix.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bf6980d
  32. 12 11月, 2014 1 次提交
  33. 30 10月, 2014 1 次提交
  34. 29 7月, 2014 1 次提交
  35. 15 7月, 2014 1 次提交