1. 12 8月, 2017 1 次提交
    • D
      net: ipv4: set orig_oif based on fib result for local traffic · 839da4d9
      David Ahern 提交于
      Attempts to connect to a local address with a socket bound
      to a device with the local address hangs if there is no listener:
      
        $ ip addr sh dev eth1
        3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff
          inet 10.100.1.4/24 scope global eth1
             valid_lft forever preferred_lft forever
          inet6 2001:db8:1::4/120 scope global
             valid_lft forever preferred_lft forever
          inet6 fe80::e0:f9ff:fe1c:37/64 scope link
             valid_lft forever preferred_lft forever
      
        $ vrf-test -I eth1 -r 10.100.1.4
        <hangs when there is no server>
      
      (don't let the command name fool you; vrf-test works without vrfs.)
      
      The problem is that the original intended device, eth1 in this case, is
      lost when the tcp reset is sent, so the socket lookup does not find a
      match for the reset and the connect attempt hangs. Fix by adjusting
      orig_oif for local traffic to the device from the fib lookup result.
      
      With this patch you get the more user friendly:
        $ vrf-test -I eth1 -r 10.100.1.4
        connect failed: 111: Connection refused
      
      orig_oif is saved to the newly created rtable as rt_iif and when set
      it is used as the dif for socket lookups. It is set based on flowi4_oif
      passed in to ip_route_output_key_hash_rcu and will be set to either
      the loopback device, an l3mdev device, nothing (flowi4_oif = 0 which
      is the case in the example above) or a netdev index depending on the
      lookup path. In each case, resetting orig_oif to the device in the fib
      result for the RTN_LOCAL case allows the actual device to be preserved
      as the skb tx and rx is done over the loopback or VRF device.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      839da4d9
  2. 10 8月, 2017 1 次提交
  3. 20 6月, 2017 1 次提交
  4. 18 6月, 2017 7 次提交
    • W
      net: remove DST_NOCACHE flag · a4c2fd7f
      Wei Wang 提交于
      DST_NOCACHE flag check has been removed from dst_release() and
      dst_hold_safe() in a previous patch because all the dst are now ref
      counted properly and can be released based on refcnt only.
      Looking at the rest of the DST_NOCACHE use, all of them can now be
      removed or replaced with other checks.
      So this patch gets rid of all the DST_NOCACHE usage and remove this flag
      completely.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c2fd7f
    • W
      net: remove DST_NOGC flag · b2a9c0ed
      Wei Wang 提交于
      Now that all the components have been changed to release dst based on
      refcnt only and not depend on dst gc anymore, we can remove the
      temporary flag DST_NOGC.
      
      Note that we also need to remove the DST_NOCACHE check in dst_release()
      and dst_hold_safe() because now all the dst are released based on refcnt
      and behaves as DST_NOCACHE.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2a9c0ed
    • W
      ipv4: mark DST_NOGC and remove the operation of dst_free() · b838d5e1
      Wei Wang 提交于
      With the previous preparation patches, we are ready to get rid of the
      dst gc operation in ipv4 code and release dst based on refcnt only.
      So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
      to dst_free().
      At this point, all dst created in ipv4 code do not use the dst gc
      anymore and will be destroyed at the point when refcnt drops to 0.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b838d5e1
    • W
      ipv4: call dst_hold_safe() properly · 9df16efa
      Wei Wang 提交于
      This patch checks all the calls to
      dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if
      dst_hold_safe() is needed to avoid double free issue if dst
      gc is removed and dst_release() directly destroys dst when
      dst->__refcnt drops to 0.
      
      In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock().
      UDP and other similar protocols always hold refcount for
      skb->_skb_refdst. So both paths seem to be safe.
      
      In rx path, as it is lockless and skb_dst_set_noref() is likely to be
      used, dst_hold_safe() should always be used when trying to hold dst.
      
      In the routing code, if dst is held during an rcu protected session, it
      is necessary to call dst_hold_safe() as the current dst might be in its
      rcu grace period.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9df16efa
    • W
      ipv4: call dst_dev_put() properly · 95c47f9c
      Wei Wang 提交于
      As the intend of this patch series is to completely remove dst gc,
      we need to call dst_dev_put() to release the reference to dst->dev
      when removing routes from fib because we won't keep the gc list anymore
      and will lose the dst pointer right after removing the routes.
      Without the gc list, there is no way to find all the dst's that have
      dst->dev pointing to the going-down dev.
      Hence, we are doing dst_dev_put() immediately before we lose the last
      reference of the dst from the routing code. The next dst_check() will
      trigger a route re-lookup to find another route (if there is any).
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95c47f9c
    • W
      ipv4: take dst->__refcnt when caching dst in fib · 0830106c
      Wei Wang 提交于
      In IPv4 routing code, fib_nh and fib_nh_exception can hold pointers
      to struct rtable but they never increment dst->__refcnt.
      This leads to the need of the dst garbage collector because when user
      is done with this dst and calls dst_release(), it can only decrement
      dst->__refcnt and can not free the dst even it sees dst->__refcnt
      drops from 1 to 0 (unless DST_NOCACHE flag is set) because the routing
      code might still hold reference to it.
      And when the routing code tries to delete a route, it has to put the
      dst to the gc_list if dst->__refcnt is not yet 0 and have a gc thread
      running periodically to check on dst->__refcnt and finally to free dst
      when refcnt becomes 0.
      
      This patch increments dst->__refcnt when
      fib_nh/fib_nh_exception holds reference to this dst and properly release
      the dst when fib_nh/fib_nh_exception has been updated with a new dst.
      
      This patch is a preparation in order to fully get rid of dst gc later.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0830106c
    • W
      net: use loopback dev when generating blackhole route · 1dbe3252
      Wei Wang 提交于
      Existing ipv4/6_blackhole_route() code generates a blackhole route
      with dst->dev pointing to the passed in dst->dev.
      It is not necessary to hold reference to the passed in dst->dev
      because the packets going through this route are dropped anyway.
      A loopback interface is good enough so that we don't need to worry about
      releasing this dst->dev when this dev is going down.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dbe3252
  5. 01 6月, 2017 1 次提交
  6. 27 5月, 2017 6 次提交
  7. 09 5月, 2017 2 次提交
  8. 25 4月, 2017 1 次提交
    • R
      ipv4: Avoid caching l3mdev dst on mismatched local route · b7c8487c
      Robert Shearman 提交于
      David reported that doing the following:
      
          ip li add red type vrf table 10
          ip link set dev eth1 vrf red
          ip addr add 127.0.0.1/8 dev red
          ip link set dev eth1 up
          ip li set red up
          ping -c1 -w1 -I red 127.0.0.1
          ip li del red
      
      when either policy routing IP rules are present or the local table
      lookup ip rule is before the l3mdev lookup results in a hang with
      these messages:
      
          unregister_netdevice: waiting for red to become free. Usage count = 1
      
      The problem is caused by caching the dst used for sending the packet
      out of the specified interface on a local route with a different
      nexthop interface. Thus the dst could stay around until the route in
      the table the lookup was done is deleted which may be never.
      
      Address the problem by not forcing output device to be the l3mdev in
      the flow's output interface if the lookup didn't use the l3mdev. This
      then results in the dst using the right device according to the route.
      
      Changes in v2:
       - make the dev_out passed in by __ip_route_output_key_hash correct
         instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
         suggested by David.
      
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Suggested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7c8487c
  9. 18 4月, 2017 1 次提交
  10. 14 4月, 2017 2 次提交
  11. 07 4月, 2017 2 次提交
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · bbadb9a2
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, fib_multipath_hash ends up being called with
      that skb. However, as that skb contains no information beyond the
      protocol type, the calculated hash does not match the one we would see
      for a real packet.
      
      There is currently no way to fix this for layer 4 hashing, as
      RTM_GETROUTE doesn't have the necessary information to create layer 4
      headers. To fix this for layer 3 hashing, set appropriate saddr/daddrs
      in the skb and also change the protocol to UDP to avoid special
      treatment for ICMP.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbadb9a2
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · a8801799
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, ip_multipath_icmp_hash ends up being called,
      which uses the source/destination addresses within the skb to calculate
      a hash. However, those are not set in the synthetic skb, causing it to
      return an arbitrary and incorrect result.
      
      Instead, use UDP, which gets no such special treatment.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8801799
  12. 22 3月, 2017 1 次提交
    • N
      net: ipv4: add support for ECMP hash policy choice · bf4e0a3d
      Nikolay Aleksandrov 提交于
      This patch adds support for ECMP hash policy choice via a new sysctl
      called fib_multipath_hash_policy and also adds support for L4 hashes.
      The current values for fib_multipath_hash_policy are:
       0 - layer 3 (default)
       1 - layer 4
      If there's an skb hash already set and it matches the chosen policy then it
      will be used instead of being calculated (currently only for L4).
      In L3 mode we always calculate the hash due to the ICMP error special
      case, the flow dissector's field consistentification should handle the
      address order thus we can remove the address reversals.
      If the skb is provided we always use it for the hash calculation,
      otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf4e0a3d
  13. 27 2月, 2017 2 次提交
  14. 08 2月, 2017 1 次提交
  15. 13 1月, 2017 1 次提交
  16. 10 1月, 2017 1 次提交
  17. 09 1月, 2017 2 次提交
  18. 07 1月, 2017 1 次提交
  19. 30 12月, 2016 1 次提交
  20. 25 12月, 2016 1 次提交
  21. 23 12月, 2016 1 次提交
  22. 02 12月, 2016 2 次提交
  23. 01 12月, 2016 1 次提交