1. 04 11月, 2012 1 次提交
  2. 23 10月, 2012 1 次提交
    • N
      ipv6: add support of equal cost multipath (ECMP) · 51ebd318
      Nicolas Dichtel 提交于
      Each nexthop is added like a single route in the routing table. All routes
      that have the same metric/weight and destination but not the same gateway
      are considering as ECMP routes. They are linked together, through a list called
      rt6i_siblings.
      
      ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
      the other (in both case, the flag NLM_F_EXCL should not be set).
      
      The patch is based on a previous work from
      Luc Saillard <luc.saillard@6wind.com>.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51ebd318
  3. 19 9月, 2012 1 次提交
  4. 06 9月, 2012 1 次提交
  5. 05 7月, 2012 1 次提交
  6. 16 6月, 2012 2 次提交
  7. 11 6月, 2012 2 次提交
  8. 18 4月, 2012 2 次提交
    • J
      ipv6: clean up rt6_clean_expires · cda31e10
      Jiri Bohac 提交于
      Functionally, this change is a NOP.
      
      Semantically, rt6_clean_expires() wants to do rt->dst.from = NULL instead of
      rt->dst.expires = 0. It is clearing the RTF_EXPIRES flag, so the union is going
      to be treated as a pointer (dst.from) not a long (dst.expires).
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cda31e10
    • J
      ipv6: fix rt6_update_expires · edfb5d46
      Jiri Bohac 提交于
      Commit 1716a961 (ipv6: fix problem with expired dst cache) broke PMTU
      discovery. rt6_update_expires() calls dst_set_expires(), which only updates
      dst->expires if it has not been set previously (expires == 0) or if the new
      expires is earlier than the current dst->expires.
      
      rt6_update_expires() needs to zero rt->dst.expires, otherwise it will contain
      ivalid data left over from rt->dst.from and will confuse dst_set_expires().
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edfb5d46
  9. 14 4月, 2012 1 次提交
    • G
      ipv6: fix problem with expired dst cache · 1716a961
      Gao feng 提交于
      If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
      this dst cache will not check expire because it has no RTF_EXPIRES flag.
      So this dst cache will always be used until the dst gc run.
      
      Change the struct dst_entry,add a union contains new pointer from and expires.
      When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
      we can use this field to point to where the dst cache copy from.
      The dst.from is only used in IPV6.
      
      rt6_check_expired check if rt6_info.dst.from is expired.
      
      ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
      and RTF_DEFAULT.then hold the ort.
      
      ip6_dst_destroy release the ort.
      
      Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
      and change the code to use these new adding functions.
      
      Changes from v5:
      modify ip6_route_add and ndisc_router_discovery to use new adding functions.
      
      Only set dst.from when the ort has flag RTF_ADDRCONF
      and RTF_DEFAULT.then hold the ort.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1716a961
  10. 31 12月, 2011 1 次提交
    • J
      IPv6: Avoid taking write lock for /proc/net/ipv6_route · 32b293a5
      Josh Hunt 提交于
      During some debugging I needed to look into how /proc/net/ipv6_route
      operated and in my digging I found its calling fib6_clean_all() which uses
      "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
      found this on 2.6.32, but reading the code I believe the same basic idea
      exists currently. Looking at the rtnetlink code they are only calling
      "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
      reading from proc isn't the recommended way of fetching the ipv6 route
      table; taking a write lock seems unnecessary and would probably cause
      network performance issues.
      
      To verify this I loaded up the ipv6 route table and then ran iperf in 3
      cases:
        * doing nothing
        * reading ipv6 route table via proc
          (while :; do cat /proc/net/ipv6_route > /dev/null; done)
        * reading ipv6 route table via rtnetlink
          (while :; do ip -6 route show table all > /dev/null; done)
      
      * Load the ipv6 route table up with:
        * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
      
      * iperf commands:
        * client: iperf -i 1 -V -c <ipv6 addr>
        * server: iperf -V -s
      
      * iperf results - 3 runs each (in Mbits/sec)
        * nothing: client: 927,927,927 server: 927,927,927
        * proc: client: 179,97,96,113 server: 142,112,133
        * iproute: client: 928,927,928 server: 927,927,927
      
      lock_stat shows taking the write lock is causing the slowdown. Using this
      info I decided to write a version of fib6_clean_all() which replaces
      write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
      this new function I see the same results as with my rtnetlink iperf test.
      Signed-off-by: NJosh Hunt <joshhunt00@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b293a5
  11. 29 12月, 2011 1 次提交
  12. 18 7月, 2011 1 次提交
  13. 25 4月, 2011 1 次提交
  14. 23 4月, 2011 1 次提交
  15. 16 4月, 2011 1 次提交
  16. 13 3月, 2011 1 次提交
  17. 11 2月, 2011 1 次提交
    • D
      inet: Create a mechanism for upward inetpeer propagation into routes. · 6431cbc2
      David S. Miller 提交于
      If we didn't have a routing cache, we would not be able to properly
      propagate certain kinds of dynamic path attributes, for example
      PMTU information and redirects.
      
      The reason is that if we didn't have a routing cache, then there would
      be no way to lookup all of the active cached routes hanging off of
      sockets, tunnels, IPSEC bundles, etc.
      
      Consider the case where we created a cached route, but no inetpeer
      entry existed and also we were not asked to pre-COW the route metrics
      and therefore did not force the creation a new inetpeer entry.
      
      If we later get a PMTU message, or a redirect, and store this
      information in a new inetpeer entry, there is no way to teach that
      cached route about the newly existing inetpeer entry.
      
      The facilities implemented here handle this problem.
      
      First we create a generation ID.  When we create a cached route of any
      kind, we remember the generation ID at the time of attachment.  Any
      time we force-create an inetpeer entry in response to new path
      information, we bump that generation ID.
      
      The dst_ops->check() callback is where the knowledge of this event
      is propagated.  If the global generation ID does not equal the one
      stored in the cached route, and the cached route has not attached
      to an inetpeer yet, we look it up and attach if one is found.  Now
      that we've updated the cached route's information, we update the
      route's generation ID too.
      
      This clears the way for implementing PMTU and redirects directly in
      the inetpeer cache.  There is absolutely no need to consult cached
      route information in order to maintain this information.
      
      At this point nothing bumps the inetpeer genids, that comes in the
      later changes which handle PMTUs and redirects using inetpeers.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6431cbc2
  18. 01 12月, 2010 1 次提交
  19. 11 6月, 2010 1 次提交
  20. 02 4月, 2010 1 次提交
  21. 19 2月, 2010 1 次提交
  22. 13 2月, 2010 1 次提交
    • P
      ipv6: fib: fix crash when changing large fib while dumping it · 2bec5a36
      Patrick McHardy 提交于
      When the fib size exceeds what can be dumped in a single skb, the
      dump is suspended and resumed once the last skb has been received
      by userspace. When the fib is changed while the dump is suspended,
      the walker might contain stale pointers, causing a crash when the
      dump is resumed.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
      PGD 5347a067 PUD 65c7067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      ...
      RIP: 0010:[<ffffffffa01bce04>]
      [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
      ...
      Call Trace:
       [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71
       [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6]
       [<ffffffff81371af4>] netlink_dump+0x5b/0x19e
       [<ffffffff8134f288>] ? consume_skb+0x28/0x2a
       [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6
       [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151
       [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79
       [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3
       [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38
       [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10
       [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5
       [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f
       [<ffffffff810ef152>] ? fget_light+0x2f/0xac
       [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94
       [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223
      
      Store the serial number when beginning to walk the fib and reload
      pointers when continuing to walk after a change occured. Similar
      to other dumping functions, this might cause unrelated entries to
      be missed when entries are deleted.
      Tested-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bec5a36
  23. 04 11月, 2009 1 次提交
  24. 31 7月, 2009 1 次提交
    • N
      xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1
      Neil Horman 提交于
      Choose saner defaults for xfrm[4|6] gc_thresh values on init
      
      Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
      (set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
      dynamically at boot time, the static selections can be non-sensical.
      This patch dynamically selects an appropriate gc threshold based on
      the corresponding main routing table size, using the assumption that
      we should in the worst case be able to handle as many connections as
      the routing table can.
      
      For ipv4, the maximum route cache size is 16 * the number of hash
      buckets in the route cache.  Given that xfrm4 starts garbage
      collection at the gc_thresh and prevents new allocations at 2 *
      gc_thresh, we set gc_thresh to half the maximum route cache size.
      
      For ipv6, its a bit trickier.  there is no maximum route cache size,
      but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
      sane to select a simmilar gc_thresh for the xfrm6 code that is half
      the number of hash buckets in the v6 route cache times 16 (like the v4
      code does).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a33bc5c1
  25. 05 3月, 2008 1 次提交
  26. 04 3月, 2008 3 次提交
  27. 08 2月, 2008 1 次提交
  28. 29 1月, 2008 5 次提交
  29. 11 10月, 2007 1 次提交
  30. 26 4月, 2007 1 次提交
  31. 26 3月, 2007 1 次提交
    • D
      [IPV6]: Fix routing round-robin locking. · f11e6659
      David S. Miller 提交于
      As per RFC2461, section 6.3.6, item #2, when no routers on the
      matching list are known to be reachable or probably reachable we
      do round robin on those available routes so that we make sure
      to probe as many of them as possible to detect when one becomes
      reachable faster.
      
      Each routing table has a rwlock protecting the tree and the linked
      list of routes at each leaf.  The round robin code executes during
      lookup and thus with the rwlock taken as a reader.  A small local
      spinlock tries to provide protection but this does not work at all
      for two reasons:
      
      1) The round-robin list manipulation, as coded, goes like this (with
         read lock held):
      
      	walk routes finding head and tail
      
      	spin_lock();
      	rotate list using head and tail
      	spin_unlock();
      
         While one thread is rotating the list, another thread can
         end up with stale values of head and tail and then proceed
         to corrupt the list when it gets the lock.  This ends up causing
         the OOPS in fib6_add() later onthat many people have been hitting.
      
      2) All the other code paths that run with the rwlock held as
         a reader do not expect the list to change on them, they
         expect it to remain completely fixed while they hold the
         lock in that way.
      
      So, simply stated, it is impossible to implement this correctly using
      a manipulation of the list without violating the rwlock locking
      semantics.
      
      Reimplement using a per-fib6_node round-robin pointer.  This way we
      don't need to manipulate the list at all, and since the round-robin
      pointer can only ever point to real existing entries we don't need
      to perform any locking on the changing of the round-robin pointer
      itself.  We only need to reset the round-robin pointer to NULL when
      the entry it is pointing to is removed.
      
      The idea is from Thomas Graf and it is very similar to how this
      was implemented before the advanced router selection code when in.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f11e6659