1. 24 12月, 2016 1 次提交
    • I
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel 提交于
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53f800e3
  2. 01 12月, 2016 1 次提交
    • Z
      neigh: remove duplicate check for same neigh · 18502acd
      Zhang Shengju 提交于
      Currently loop index 'idx' is used as the index in the neigh list of interest.
      It's increased only when the neigh is dumped. It's not the absolute index in
      the list. Because there is no info to record which neigh has already be scanned
      by previous loop. This will cause the filtered out neighs to be scanned mulitple
      times.
      
      This patch make idx as the absolute index in the list, it will increase no matter
      whether the neigh is filtered. This will prevent the above problem.
      
      And this is in line with other dump functions.
      
      v2:
       - take David Ahern's advice to do simple change
      Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18502acd
  3. 09 8月, 2016 1 次提交
    • J
      neigh: allow admin to set NUD_STALE · 0e7bbcc1
      Julian Anastasov 提交于
      Admin should be able to set any state. Currently, this fails
      when lladdr is not changed and state is changed from
      NUD_CONNECTED to NUD_STALE:
      
      ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
      ip neigh show to 192.168.8.1
      192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
      
      Problem may be from 2.1.X days.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e7bbcc1
  4. 27 7月, 2016 1 次提交
    • H
      net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() · d1c2b501
      He Chunhui 提交于
      NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
      neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
      the entry state should not be changed.
      
      Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
      was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
      state can be changed to NUD_STALE.
      
      This may cause problem. Because NUD_STALE lladdr doesn't guarantee
      reachability, when we send traffic, the state will be changed to
      NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
      we will change the state to NUD_PROBE and send probe traffic. But now the
      state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
      so the probe traffic will not be sent. This situation may happen again and
      again, and packets will be sent to an non-reachable lladdr forever.
      
      The fix is to remove the "NUD_CONNECTED" condition. After that the
      "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
      be redundant, so remove it.
      
      This change may increase probe traffic, but it's essential since NUD_STALE
      lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
      when we can't get confirmation, even while remote packets try to set
      NUD_STALE state.
      Signed-off-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c2b501
  5. 06 7月, 2016 2 次提交
    • I
      neigh: Send a notification when DELAY_PROBE_TIME changes · 2a4501ae
      Ido Schimmel 提交于
      When the data plane is offloaded the traffic doesn't go through the
      networking stack. Therefore, after first resolving a neighbour the NUD
      state machine will transition it from REACHABLE to STALE until it's
      finally deleted by the garbage collector.
      
      To prevent such situations the offloading driver should notify the NUD
      state machine on any neighbours that were recently used. The driver's
      polling interval should be set so that the NUD state machine can
      function as if the traffic wasn't offloaded.
      
      Currently, there are no in-tree drivers that can report confirmation for
      a neighbour, but only 'used' indication. Therefore, the polling interval
      should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will
      transition from REACHABLE state to DELAY (instead of STALE) if "a packet
      was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861).
      
      Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via
      netlink or sysctl - so that offloading drivers can correctly set their
      polling interval.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a4501ae
    • J
      net: add dev arg to ndo_neigh_construct/destroy · 503eebc2
      Jiri Pirko 提交于
      As the following patch will allow upper devices to follow the call down
      lower devices, we need to add dev here and not rely on n->dev.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      503eebc2
  6. 29 6月, 2016 1 次提交
    • D
      neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() · b560f03d
      David Barroso 提交于
      neigh_xmit() expects to be called inside an RCU-bh read side critical
      section, and while one of its two current callers gets this right, the
      other one doesn't.
      
      More specifically, neigh_xmit() has two callers, mpls_forward() and
      mpls_output(), and while both callers call neigh_xmit() under
      rcu_read_lock(), this provides sufficient protection for neigh_xmit()
      only in the case of mpls_forward(), as that is always called from
      softirq context and therefore doesn't need explicit BH protection,
      while mpls_output() can be called from process context with softirqs
      enabled.
      
      When mpls_output() is called from process context, with softirqs
      enabled, we can be preempted by a softirq at any time, and RCU-bh
      considers the completion of a softirq as signaling the end of any
      pending read-side critical sections, so if we do get a softirq
      while we are in the part of neigh_xmit() that expects to be run inside
      an RCU-bh read side critical section, we can end up with an unexpected
      RCU grace period running right in the middle of that critical section,
      making things go boom.
      
      This patch fixes this impedance mismatch in the callee, by making
      neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
      expects to be treated as an RCU-bh read side critical section, as this
      seems a safer option than fixing it in the callers.
      
      Fixes: 4fd3d7d9 ("neigh: Add helper function neigh_xmit")
      Signed-off-by: NDavid Barroso <dbarroso@fastly.com>
      Signed-off-by: NLennert Buytenhek <lbuytenhek@fastly.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b560f03d
  7. 27 4月, 2016 1 次提交
  8. 24 4月, 2016 1 次提交
  9. 03 12月, 2015 1 次提交
  10. 18 11月, 2015 1 次提交
  11. 07 10月, 2015 1 次提交
  12. 30 9月, 2015 1 次提交
  13. 11 8月, 2015 1 次提交
    • R
      net: add explicit logging and stat for neighbour table overflow · fb811395
      Rick Jones 提交于
      Add an explicit neighbour table overflow message (ratelimited) and
      statistic to make diagnosing neighbour table overflows tractable in
      the wild.
      
      Diagnosing a neighbour table overflow can be quite difficult in the wild
      because there is no explicit dmesg logged.  Callers to neighbour code
      seem to use net_dbg_ratelimit when the neighbour call fails which means
      the "base message" is not emitted and the callback suppressed messages
      from the ratelimiting can end-up juxtaposed with unrelated messages.
      Further, a forced garbage collection will increment a stat on each call
      whether it was successful in freeing-up a table entry or not, so that
      statistic is only a hint.  So, add a net_info_ratelimited message and
      explicit statistic to the neighbour code.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb811395
  14. 22 6月, 2015 1 次提交
    • J
      neigh: do not modify unlinked entries · 2c51a97f
      Julian Anastasov 提交于
      The lockless lookups can return entry that is unlinked.
      Sometimes they get reference before last neigh_cleanup_and_release,
      sometimes they do not need reference. Later, any
      modification attempts may result in the following problems:
      
      1. entry is not destroyed immediately because neigh_update
      can start the timer for dead entry, eg. on change to NUD_REACHABLE
      state. As result, entry lives for some time but is invisible
      and out of control.
      
      2. __neigh_event_send can run in parallel with neigh_destroy
      while refcnt=0 but if timer is started and expired refcnt can
      reach 0 for second time leading to second neigh_destroy and
      possible crash.
      
      Thanks to Eric Dumazet and Ying Xue for their work and analyze
      on the __neigh_event_send change.
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Fixes: a263b309 ("ipv4: Make neigh lookups directly in output packet path.")
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c51a97f
  15. 22 5月, 2015 1 次提交
    • E
      neigh: Better handling of transition to NUD_PROBE state · 765c9c63
      Erik Kline 提交于
      [1] When entering NUD_PROBE state via neigh_update(), perhaps received
          from userspace, correctly (re)initialize the probes count to zero.
      
          This is useful for forcing revalidation of a neighbor (for example
          if the host is attempting to do DNA [IPv4 4436, IPv6 6059]).
      
      [2] Notify listeners when a neighbor goes into NUD_PROBE state.
      
          By sending notifications on entry to NUD_PROBE state listeners get
          more timely warnings of imminent connectivity issues.
      
          The current notifications on entry to NUD_STALE have somewhat
          limited usefulness: NUD_STALE is a perfectly normal state, as is
          NUD_DELAY, whereas notifications on entry to NUD_FAILURE come after
          a neighbor reachability problem has been confirmed (typically after
          three probes).
      Signed-off-by: NErik Kline <ek@google.com>
      Acked-By: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765c9c63
  16. 21 3月, 2015 1 次提交
  17. 13 3月, 2015 1 次提交
  18. 09 3月, 2015 1 次提交
  19. 04 3月, 2015 2 次提交
    • E
      neigh: Add helper function neigh_xmit · 4fd3d7d9
      Eric W. Biederman 提交于
      For MPLS I am building the code so that either the neighbour mac
      address can be specified or we can have a next hop in ipv4 or ipv6.
      
      The kind of next hop we have is indicated by the neighbour table
      pointer.  A neighbour table pointer of NULL is a link layer address.
      A non-NULL neighbour table pointer indicates which neighbour table and
      thus which address family the next hop address is in that we need to
      look up.
      
      The code either sends a packet directly or looks up the appropriate
      neighbour table entry and sends the packet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fd3d7d9
    • E
      neigh: Factor out ___neigh_lookup_noref · 60395a20
      Eric W. Biederman 提交于
      While looking at the mpls code I found myself writing yet another
      version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
      and __ipv6_lookup_noref.
      
      So to make my work a little easier and to make it a smidge easier to
      verify/maintain the mpls code in the future I stopped and wrote
      ___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
      __ipv6_lookup_noref in terms of this new function.  I tested my new
      version by verifying that the same code is generated in
      ip_finish_output2 and ip6_finish_output2 where these functions are
      inlined.
      
      To get to ___neigh_lookup_noref I added a new neighbour cache table
      function key_eq.  So that the static size of the key would be
      available.
      
      I also added __neigh_lookup_noref for people who want to to lookup
      a neighbour table entry quickly but don't know which neibhgour table
      they are going to look up.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60395a20
  20. 03 3月, 2015 3 次提交
  21. 19 1月, 2015 1 次提交
  22. 18 1月, 2015 1 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
  23. 14 1月, 2015 1 次提交
    • J
      neighbour: fix base_reachable_time(_ms) not effective immediatly when changed · 4bf6980d
      Jean-Francois Remy 提交于
      When setting base_reachable_time or base_reachable_time_ms on a
      specific interface through sysctl or netlink, the reachable_time
      value is not updated.
      
      This means that neighbour entries will continue to be updated using the
      old value until it is recomputed in neigh_period_work (which
          recomputes the value every 300*HZ).
      On systems with HZ equal to 1000 for instance, it means 5mins before
      the change is effective.
      
      This patch changes this behavior by recomputing reachable_time after
      each set on base_reachable_time or base_reachable_time_ms.
      The new value will become effective the next time the neighbour's timer
      is triggered.
      
      Changes are made in two places: the netlink code for set and the sysctl
      handling code. For sysctl, I use a proc_handler. The ipv6 network
      code does provide its own handler but it already refreshes
      reachable_time correctly so it's not an issue.
      Any other user of neighbour which provide its own handlers must
      refresh reachable_time.
      Signed-off-by: NJean-Francois Remy <jeff@melix.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bf6980d
  24. 12 11月, 2014 1 次提交
  25. 30 10月, 2014 1 次提交
  26. 29 7月, 2014 1 次提交
  27. 15 7月, 2014 1 次提交
  28. 14 5月, 2014 1 次提交
    • D
      neigh: set nud_state to NUD_INCOMPLETE when probing router reachability · 2176d5d4
      Duan Jiong 提交于
      Since commit 7e980569("ipv6: router reachability probing"), a router falls
      into NUD_FAILED will be probed.
      
      Now if function rt6_select() selects a router which neighbour state is NUD_FAILED,
      and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE,
      then function dst_neigh_output() can directly send packets, but actually the
      neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead
      NUD_PROBE, packets will not be sent out until the neihbour is reachable.
      
      In addition, because the route should be probes with a single NS, so we must
      set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function
      neigh_timer_handler() will not send other NS Messages.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2176d5d4
  29. 28 2月, 2014 2 次提交
  30. 27 2月, 2014 1 次提交
  31. 22 2月, 2014 1 次提交
  32. 23 1月, 2014 1 次提交
    • V
      net/neighbour: queue work on power efficient wq · f618002b
      viresh kumar 提交于
      Workqueue used in neighbour layer have no real dependency of scheduling these on
      the cpu which scheduled them.
      
      On a idle system, it is observed that an idle cpu wakes up many times just to
      service this work. It would be better if we can schedule it on a cpu which the
      scheduler believes to be the most appropriate one.
      
      This patch replaces normal workqueues with power efficient versions. This
      doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
      enabled.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f618002b
  33. 16 1月, 2014 1 次提交
  34. 15 1月, 2014 1 次提交
  35. 01 1月, 2014 1 次提交
    • D
      vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a
      David S. Miller 提交于
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2205369a