1. 06 9月, 2014 2 次提交
  2. 22 9月, 2013 1 次提交
  3. 02 7月, 2013 1 次提交
  4. 29 6月, 2013 1 次提交
    • T
      ipv4: use next hop exceptions also for input routes · 2ffae99d
      Timo Teräs 提交于
      Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
      assmued that "locally destined, and routed packets, never trigger
      PMTU events or redirects that will be processed by us".
      
      However, it seems that tunnel devices do trigger PMTU events in certain
      cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
      skb_dst(skb)->ops->update_pmtu to propage mtu information from the
      outer flows. These can cause the inner flow mtu to be decreased. If
      next hop exceptions are not consulted for pmtu, IP fragmentation will
      not be done properly for these routes.
      
      It also seems that we really need to have the PMTU information always
      for netfilter TCPMSS clamp-to-pmtu feature to work properly.
      
      So for the time being, cache separate copies of input routes for
      each next hop exception.
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ffae99d
  5. 03 6月, 2013 1 次提交
    • T
      ipv4: use separate genid for next hop exceptions · 5aad1de5
      Timo Teräs 提交于
      commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
      added the support to flush learned pmtu information.
      
      However, using rt_genid is quite heavy as it is bumped on route
      add/change and multicast events amongst other places. These can
      happen quite often, especially if using dynamic routing protocols.
      
      While this is ok with routes (as they are just recreated locally),
      the pmtu information is learned from remote systems and the icmp
      notification can come with long delays. It is worthy to have separate
      genid to avoid excessive pmtu resets.
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aad1de5
  6. 13 3月, 2013 1 次提交
    • D
      ipv4: fix definition of FIB_TABLE_HASHSZ · 5b9e12db
      Denis V. Lunev 提交于
      a long time ago by the commit
      
        commit 93456b6d
        Author: Denis V. Lunev <den@openvz.org>
        Date:   Thu Jan 10 03:23:38 2008 -0800
      
          [IPV4]: Unify access to the routing tables.
      
      the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
      it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
      code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH
      
      This patch returns the situation to the original state.
      
      The problem was spotted by Tingwei Liu.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Tingwei Liu <tingw.liu@gmail.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b9e12db
  7. 05 10月, 2012 1 次提交
    • E
      ipv4: add a fib_type to fib_info · f4ef85bb
      Eric Dumazet 提交于
      commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
      introduced a regression for forwarding.
      
      This was hard to reproduce but the symptom was that packets were
      delivered to local host instead of being forwarded.
      
      David suggested to add fib_type to fib_info so that we dont
      inadvertently share same fib_info for different purposes.
      
      With help from Julian Anastasov who provided very helpful
      hints, reproduced here :
      
      <quote>
              Can it be a problem related to fib_info reuse
      from different routes. For example, when local IP address
      is created for subnet we have:
      
      broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
      192.168.0.1
      192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
      local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
      
              The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
      a reused fib_info structure where we put cached routes.
      The result can be same fib_info for 192.168.0.255 and
      192.168.0.0/24. RTN_BROADCAST is cached only for input
      routes. Incoming broadcast to 192.168.0.255 can be cached
      and can cause problems for traffic forwarded to 192.168.0.0/24.
      So, this patch should solve the problem because it
      separates the broadcast from unicast traffic.
      
              And the ip_route_input_slow caching will work for
      local and broadcast input routes (above routes 1 and 3) just
      because they differ in scope and use different fib_info.
      
      </quote>
      
      Many thanks to Chris Clayton for his patience and help.
      Reported-by: NChris Clayton <chris2553@googlemail.com>
      Bisected-by: NChris Clayton <chris2553@googlemail.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4ef85bb
  8. 01 8月, 2012 3 次提交
    • D
      c5038a83
    • E
      ipv4: percpu nh_rth_output cache · d26b3a7c
      Eric Dumazet 提交于
      Input path is mostly run under RCU and doesnt touch dst refcnt
      
      But output path on forwarding or UDP workloads hits
      badly dst refcount, and we have lot of false sharing, for example
      in ipv4_mtu() when reading rt->rt_pmtu
      
      Using a percpu cache for nh_rth_output gives a nice performance
      increase at a small cost.
      
      24 udpflood test on my 24 cpu machine (dummy0 output device)
      (each process sends 1.000.000 udp frames, 24 processes are started)
      
      before : 5.24 s
      after : 2.06 s
      For reference, time on linux-3.5 : 6.60 s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26b3a7c
    • E
      ipv4: Restore old dst_free() behavior. · 54764bb6
      Eric Dumazet 提交于
      commit 404e0a8b (net: ipv4: fix RCU races on dst refcounts) tried
      to solve a race but added a problem at device/fib dismantle time :
      
      We really want to call dst_free() as soon as possible, even if sockets
      still have dst in their cache.
      dst_release() calls in free_fib_info_rcu() are not welcomed.
      
      Root of the problem was that now we also cache output routes (in
      nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
      rt_free(), because output route lookups are done in process context.
      
      Based on feedback and initial patch from David Miller (adding another
      call_rcu_bh() call in fib, but it appears it was not the right fix)
      
      I left the inet_sk_rx_dst_set() helper and added __rcu attributes
      to nh_rth_output and nh_rth_input to better document what is going on in
      this code.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54764bb6
  9. 21 7月, 2012 2 次提交
    • D
      ipv4: Cache input routes in fib_info nexthops. · d2d68ba9
      David S. Miller 提交于
      Caching input routes is slightly simpler than output routes, since we
      don't need to be concerned with nexthop exceptions.  (locally
      destined, and routed packets, never trigger PMTU events or redirects
      that will be processed by us).
      
      However, we have to elide caching for the DIRECTSRC and non-zero itag
      cases.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2d68ba9
    • D
      ipv4: Cache output routes in fib_info nexthops. · f2bb4bed
      David S. Miller 提交于
      If we have an output route that lacks nexthop exceptions, we can cache
      it in the FIB info nexthop.
      
      Such routes will have DST_HOST cleared because such routes refer to a
      family of destinations, rather than just one.
      
      The sequence of the handling of exceptions during route lookup is
      adjusted to make the logic work properly.
      
      Before we allocate the route, we lookup the exception.
      
      Then we know if we will cache this route or not, and therefore whether
      DST_HOST should be set on the allocated route.
      
      Then we use DST_HOST to key off whether we should store the resulting
      route, during rt_set_nexthop(), in the FIB nexthop cache.
      
      With help from Eric Dumazet.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2bb4bed
  10. 20 7月, 2012 1 次提交
    • J
      ipv4: use seqlock for nh_exceptions · aee06da6
      Julian Anastasov 提交于
      Use global seqlock for the nh_exceptions. Call
      fnhe_oldest with the right hash chain. Correct the diff
      value for dst_set_expires.
      
      v2: after suggestions from Eric Dumazet:
      * get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
      * continue daddr search in rt_bind_exception
      
      v3:
      * remove the daddr check before seqlock in rt_bind_exception
      * restart lookup in rt_bind_exception on detected seqlock change,
      as suggested by David Miller
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aee06da6
  11. 17 7月, 2012 1 次提交
    • D
      ipv4: Add FIB nexthop exceptions. · 4895c771
      David S. Miller 提交于
      In a regime where we have subnetted route entries, we need a way to
      store persistent storage about destination specific learned values
      such as redirects and PMTU values.
      
      This is implemented here via nexthop exceptions.
      
      The initial implementation is a 2048 entry hash table with relaiming
      starting at chain length 5.  A more sophisticated scheme can be
      devised if that proves necessary.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4895c771
  12. 13 7月, 2012 2 次提交
  13. 11 7月, 2012 1 次提交
  14. 06 7月, 2012 1 次提交
  15. 29 6月, 2012 2 次提交
    • D
      ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8
      David S. Miller 提交于
      If rpfilter is off (or the SKB has an IPSEC path) and there are not
      tclassid users, we don't have to do anything at all when
      fib_validate_source() is invoked besides setting the itag to zero.
      
      We monitor tclassid uses with a counter (modified only under RTNL and
      marked __read_mostly) and we protect the fib_validate_source() real
      work with a test against this counter and whether rpfilter is to be
      done.
      
      Having a way to know whether we need no tclassid processing or not
      also opens the door for future optimized rpfilter algorithms that do
      not perform full FIB lookups.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a9bc9b8
    • D
      ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380
      David S. Miller 提交于
      Checking for in_dev being NULL is pointless.
      
      In fact, all of our callers have in_dev precomputed already,
      so just pass it in and remove the NULL checking.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e56e380
  16. 28 6月, 2012 2 次提交
    • D
      ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41347dcd
    • D
      ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e
      David S. Miller 提交于
      The specific destination is the host we direct unicast replies to.
      Usually this is the original packet source address, but if we are
      responding to a multicast or broadcast packet we have to use something
      different.
      
      Specifically we must use the source address we would use if we were to
      send a packet to the unicast source of the original packet.
      
      The routing cache precomputes this value, but we want to remove that
      precomputation because it creates a hard dependency on the expensive
      rpfilter source address validation which we'd like to make cheaper.
      
      There are only three places where this matters:
      
      1) ICMP replies.
      
      2) pktinfo CMSG
      
      3) IP options
      
      Now there will be no real users of rt->rt_spec_dst and we can simply
      remove it altogether.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35ebf65e
  17. 11 6月, 2012 1 次提交
  18. 16 4月, 2012 1 次提交
  19. 15 4月, 2011 1 次提交
    • D
      ipv4: Call fib_select_default() only when actually necessary. · 21d8c49e
      David S. Miller 提交于
      fib_select_default() is a complete NOP, and completely pointless
      to invoke, when we have no more than 1 default route installed.
      
      And this is far and away the common case.
      
      So remember how many prefixlen==0 routes we have in the routing
      table, and elide the call when we have no more than one of those.
      
      This cuts output route creation time by 157 cycles on Niagara2+.
      
      In order to add the new int to fib_table, we have to correct the type
      of ->tb_data[] to unsigned long, otherwise the private area will be
      unaligned on 64-bit systems.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
      21d8c49e
  20. 11 4月, 2011 1 次提交
  21. 25 3月, 2011 2 次提交
  22. 13 3月, 2011 1 次提交
  23. 11 3月, 2011 1 次提交
  24. 09 3月, 2011 1 次提交
  25. 08 3月, 2011 1 次提交
    • D
      ipv4: Cache source address in nexthop entries. · 1fc050a1
      David S. Miller 提交于
      When doing output route lookups, we have to select the source address
      if the user has not specified an explicit one.
      
      First, if the route has an explicit preferred source address
      specified, then we use that.
      
      Otherwise we search the route's outgoing interface for a suitable
      address.
      
      This search can be precomputed and cached at route insertion time.
      
      The only missing part is that we have to refresh this precomputed
      value any time addresses are added or removed from the interface, and
      this is accomplished by fib_update_nh_saddrs().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc050a1
  26. 18 2月, 2011 2 次提交
  27. 02 2月, 2011 1 次提交
  28. 01 2月, 2011 2 次提交
    • D
      ipv4: Consolidate all default route selection implementations. · 0c838ff1
      David S. Miller 提交于
      Both fib_trie and fib_hash have a local implementation of
      fib_table_select_default().  This is completely unnecessary
      code duplication.
      
      Since we now remember the fib_table and the head of the fib
      alias list of the default route, we can implement one single
      generic version of this routine.
      
      Looking at the fib_hash implementation you may get the impression
      that it's possible for there to be multiple top-level routes in
      the table for the default route.  The truth is, it isn't, the
      insert code will only allow one entry to exist in the zero
      prefix hash table, because all keys evaluate to zero and all
      keys in a hash table must be unique.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c838ff1
    • D
      ipv4: Remember FIB alias list head and table in lookup results. · 5b470441
      David S. Miller 提交于
      This will be used later to implement fib_select_default() in a
      completely generic manner, instead of the current situation where the
      default route is re-looked up in the TRIE/HASH table and then the
      available aliases are analyzed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b470441
  29. 29 1月, 2011 1 次提交
  30. 14 1月, 2011 1 次提交
    • P
      netfilter: fix Kconfig dependencies · c7066f70
      Patrick McHardy 提交于
      Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
      which itself depends on NET_SCHED; this dependency is missing from netfilter.
      
      Since matching on realms is also useful without having NET_SCHED enabled and
      the option really only controls whether the tclassid member is included in
      route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
      it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.
      Reported-by: NVladis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      c7066f70