1. 28 7月, 2015 1 次提交
  2. 25 7月, 2015 1 次提交
    • J
      ipv4: consider TOS in fib_select_default · 2392debc
      Julian Anastasov 提交于
      fib_select_default considers alternative routes only when
      res->fi is for the first alias in res->fa_head. In the
      common case this can happen only when the initial lookup
      matches the first alias with highest TOS value. This
      prevents the alternative routes to require specific TOS.
      
      This patch solves the problem as follows:
      
      - routes that require specific TOS should be returned by
      fib_select_default only when TOS matches, as already done
      in fib_table_lookup. This rule implies that depending on the
      TOS we can have many different lists of alternative gateways
      and we have to keep the last used gateway (fa_default) in first
      alias for the TOS instead of using single tb_default value.
      
      - as the aliases are ordered by many keys (TOS desc,
      fib_priority asc), we restrict the possible results to
      routes with matching TOS and lowest metric (fib_priority)
      and routes that match any TOS, again with lowest metric.
      
      For example, packet with TOS 8 can not use gw3 (not lowest
      metric), gw4 (different TOS) and gw6 (not lowest metric),
      all other gateways can be used:
      
      tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
      tos 8 via gw2 metric 2
      tos 8 via gw3 metric 3
      tos 4 via gw4
      tos 0 via gw5
      tos 0 via gw6 metric 1
      Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2392debc
  3. 24 6月, 2015 1 次提交
    • A
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek 提交于
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eeb075f
  4. 22 6月, 2015 1 次提交
  5. 08 6月, 2015 1 次提交
    • F
      fib_trie: coding style: Use pointer after check · f38b24c9
      Firo Yang 提交于
      As Alexander Duyck pointed out that:
      struct tnode {
              ...
              struct key_vector kv[1];
      }
      The kv[1] member of struct tnode is an arry that refernced by
      a null pointer will not crash the system, like this:
      struct tnode *p = NULL;
      struct key_vector *kv = p->kv;
      As such p->kv doesn't actually dereference anything, it is simply a
      means for getting the offset to the array from the pointer p.
      
      This patch make the code more regular to avoid making people feel
      odd when they look at the code.
      Signed-off-by: NFiro Yang <firogm@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f38b24c9
  6. 27 5月, 2015 1 次提交
    • D
      ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include. · ffa915d0
      David S. Miller 提交于
      We used to get this indirectly I supposed, but no longer do.
      
      Either way, an explicit include should have been done in the
      first place.
      
         net/ipv4/fib_trie.c: In function '__node_free_rcu':
      >> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
            vfree(n);
            ^
         net/ipv4/fib_trie.c: In function 'tnode_alloc':
      >> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
            return vzalloc(size);
            ^
      >> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast
         cc1: some warnings being treated as errors
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffa915d0
  7. 23 5月, 2015 1 次提交
  8. 15 5月, 2015 1 次提交
  9. 13 5月, 2015 1 次提交
  10. 04 4月, 2015 2 次提交
  11. 24 3月, 2015 1 次提交
  12. 13 3月, 2015 1 次提交
    • A
      fib_trie: Provide a deterministic order for fib_alias w/ tables merged · 0b65bd97
      Alexander Duyck 提交于
      This change makes it so that we should always have a deterministic ordering
      for the main and local aliases within the merged table when two leaves
      overlap.
      
      So for example if we have a leaf with a key of 192.168.254.0.  If we
      previously added two aliases with a prefix length of 24 from both local and
      main the first entry would be first and the second would be second.  When I
      was coding this I had added a WARN_ON should such a situation occur as I
      wasn't sure how likely it would be.  However this WARN_ON has been
      triggered so this is something that should be addressed.
      
      With this patch the ordering of the aliases is as follows.  First they are
      sorted on prefix length, then on their table ID, then tos, and finally
      priority.  This way what we end up doing is essentially interleaving the
      two tables on what used to be leaf_info structure boundaries.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b65bd97
  13. 12 3月, 2015 2 次提交
    • A
      fib_trie: Only display main table in /proc/net/route · 654eff45
      Alexander Duyck 提交于
      When we merged the tries for local and main I had overlooked the iterator
      for /proc/net/route.  As a result it was outputting both local and main
      when the two tries were merged.
      
      This patch resolves that by only providing output for aliases that are
      actually in the main trie.  As a result we should go back to the original
      behavior which I assume will be necessary to maintain legacy support.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      654eff45
    • A
      ipv4: FIB Local/MAIN table collapse · 0ddcf43d
      Alexander Duyck 提交于
      This patch is meant to collapse local and main into one by converting
      tb_data from an array to a pointer.  Doing this allows us to point the
      local table into the main while maintaining the same variables in the
      table.
      
      As such the tb_data was converted from an array to a pointer, and a new
      array called data is added in order to still provide an object for tb_data
      to point to.
      
      In order to track the origin of the fib aliases a tb_id value was added in
      a hole that existed on 64b systems.  Using this we can also reverse the
      merge in the event that custom FIB rules are enabled.
      
      With this patch I am seeing an improvement of 20ns to 30ns for routing
      lookups as long as custom rules are not enabled, with custom rules enabled
      we fall back to split tables and the original behavior.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ddcf43d
  14. 11 3月, 2015 2 次提交
  15. 10 3月, 2015 1 次提交
  16. 07 3月, 2015 10 次提交
  17. 06 3月, 2015 3 次提交
  18. 05 3月, 2015 8 次提交
  19. 28 2月, 2015 1 次提交