1. 16 10月, 2015 2 次提交
    • M
      ipv6: Initialize rt6_info properly in ip6_blackhole_route() · 0a1f5962
      Martin KaFai Lau 提交于
      ip6_blackhole_route() does not initialize the newly allocated
      rt6_info properly.  This patch:
      1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached
      
      2. The current rt->dst._metrics init code is incorrect:
         - 'rt->dst._metrics = ort->dst._metris' is not always safe
         - Not sure what dst_copy_metrics() is trying to do here
           considering ip6_rt_blackhole_cow_metrics() always returns
           NULL
      
         Fix:
         - Always do dst_copy_metrics()
         - Replace ip6_rt_blackhole_cow_metrics() with
           dst_cow_metrics_generic()
      
      3. Mask out the RTF_PCPU bit from the newly allocated blackhole route.
         This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie().
         It is because RTF_PCPU is set while rt->dst.from is NULL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Tested-by: NPhil Sutter <phil@nwl.cc>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Phil Sutter <phil@nwl.cc>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a1f5962
    • M
      ipv6: Move common init code for rt6_info to a new function rt6_info_init() · ebfa45f0
      Martin KaFai Lau 提交于
      Introduce rt6_info_init() to do the common init work for
      'struct rt6_info' (after calling dst_alloc).
      
      It is a prep work to fix the rt6_info init logic in the
      ip6_blackhole_route().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Phil Sutter <phil@nwl.cc>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebfa45f0
  2. 13 10月, 2015 1 次提交
    • E
      ipv6: Don't call with rt6_uncached_list_flush_dev · e332bc67
      Eric W. Biederman 提交于
      As originally written rt6_uncached_list_flush_dev makes no sense when
      called with dev == NULL as it attempts to flush all uncached routes
      regardless of network namespace when dev == NULL.  Which is simply
      incorrect behavior.
      
      Furthermore at the point rt6_ifdown is called with dev == NULL no more
      network devices exist in the network namespace so even if the code in
      rt6_uncached_list_flush_dev were to attempt something sensible it
      would be meaningless.
      
      Therefore remove support in rt6_uncached_list_flush_dev for handling
      network devices where dev == NULL, and only call rt6_uncached_list_flush_dev
       when rt6_ifdown is called with a network device.
      
      Fixes: 8d0b94af ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
      Tested-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e332bc67
  3. 30 9月, 2015 1 次提交
    • D
      net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set · 741a11d9
      David Ahern 提交于
      Wolfgang reported that IPv6 stack is ignoring oif in output route lookups:
      
          With ipv6, ip -6 route get always returns the specific route.
      
          $ ip -6 r
          2001:db8:e2::1 dev enp2s0  proto kernel  metric 256
          2001:db8:e2::/64 dev enp2s0  metric 1024
          2001:db8:e3::1 dev enp3s0  proto kernel  metric 256
          2001:db8:e3::/64 dev enp3s0  metric 1024
          fe80::/64 dev enp3s0  proto kernel  metric 256
          default via 2001:db8:e3::255 dev enp3s0  metric 1024
      
          $ ip -6 r get 2001:db8:e2::100
          2001:db8:e2::100 from :: dev enp2s0  src 2001:db8:e3::1  metric 0
              cache
      
          $ ip -6 r get 2001:db8:e2::100 oif enp3s0
          2001:db8:e2::100 from :: dev enp2s0  src 2001:db8:e3::1  metric 0
              cache
      
      The stack does consider the oif but a mismatch in rt6_device_match is not
      considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags.
      
      Cc: Wolfgang Nothdurft <netdev@linux-dude.de>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      741a11d9
  4. 21 9月, 2015 1 次提交
    • N
      net: Fix behaviour of unreachable, blackhole and prohibit routes · 0315e382
      Nikola Forró 提交于
      Man page of ip-route(8) says following about route types:
      
        unreachable - these destinations are unreachable.  Packets are dis‐
        carded and the ICMP message host unreachable is generated.  The local
        senders get an EHOSTUNREACH error.
      
        blackhole - these destinations are unreachable.  Packets are dis‐
        carded silently.  The local senders get an EINVAL error.
      
        prohibit - these destinations are unreachable.  Packets are discarded
        and the ICMP message communication administratively prohibited is
        generated.  The local senders get an EACCES error.
      
      In the inet6 address family, this was correct, except the local senders
      got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
      In the inet address family, all three route types generated ICMP message
      net unreachable, and the local senders got ENETUNREACH error.
      
      In both address families all three route types now behave consistently
      with documentation.
      Signed-off-by: NNikola Forró <nforro@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0315e382
  5. 18 9月, 2015 1 次提交
  6. 16 9月, 2015 1 次提交
    • M
      ipv6: Avoid double dst_free · 8e3d5be7
      Martin KaFai Lau 提交于
      It is a prep work to get dst freeing from fib tree undergo
      a rcu grace period.
      
      The following is a common paradigm:
      if (ip6_del_rt(rt))
      	dst_free(rt)
      
      which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
      1. We don't know the ip6_del_rt(rt) failure is because it
         was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
         removed from the fib tree.
      2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
         dst_free(rt) has been called already.  A second
         dst_free(rt) is not always obviously safe.  The rt may have
         been destroyed already.
      3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
      4. It is a stopper to make dst freeing from fib tree undergo a
         rcu grace period.
      
      This patch is to use a DST_NOCACHE flag to indicate a rt is
      not managed by the fib tree.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e3d5be7
  7. 10 9月, 2015 2 次提交
    • W
      ipv6: fix ifnullfree.cocci warnings · 52fe51f8
      Wu Fengguang 提交于
      net/ipv6/route.c:2946:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
      
       NULL check before some freeing functions is not needed.
      
       Based on checkpatch warning
       "kfree(NULL) is safe this check is probably not required"
       and kfreeaddr.cocci by Julia Lawall.
      
      Generated by: scripts/coccinelle/free/ifnullfree.cocci
      
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52fe51f8
    • R
      ipv6: fix multipath route replace error recovery · 6b9ea5a6
      Roopa Prabhu 提交于
      Problem:
      The ecmp route replace support for ipv6 in the kernel, deletes the
      existing ecmp route too early, ie when it installs the first nexthop.
      If there is an error in installing the subsequent nexthops, its too late
      to recover the already deleted existing route leaving the fib
      in an inconsistent state.
      
      This patch reduces the possibility of this by doing the following:
      a) Changes the existing multipath route add code to a two stage process:
        build rt6_infos + insert them
      	ip6_route_add rt6_info creation code is moved into
      	ip6_route_info_create.
      b) This ensures that most errors are caught during building rt6_infos
        and we fail early
      c) Separates multipath add and del code. Because add needs the special
        two stage mode in a) and delete essentially does not care.
      d) In any event if the code fails during inserting a route again, a
        warning is printed (This should be unlikely)
      
      Before the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
       * kernel */
      
      After the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b9ea5a6
  8. 01 9月, 2015 3 次提交
  9. 30 8月, 2015 1 次提交
  10. 25 8月, 2015 1 次提交
  11. 21 8月, 2015 4 次提交
  12. 18 8月, 2015 4 次提交
    • T
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert 提交于
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      through.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25368623
    • M
      ipv6: Fix a potential deadlock when creating pcpu rt · 9c7370a1
      Martin KaFai Lau 提交于
      rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock).
      rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
      calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
      the write_lock(&tabl->tb6_lock).  A visualized version:
      
      read_lock(&table->tb6_lock);
      rt6_make_pcpu_route();
      => ip6_rt_pcpu_alloc();
      => dst_alloc();
      => ip6_dst_gc();
      => write_lock(&table->tb6_lock); /* oops */
      
      The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().
      
      A reported stack:
      
      [141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
      [141625.547469] Task dump for CPU 27:
      [141625.550881] mtr             R  running task        0 22121  22081 0x00000008
      [141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
      [141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
      [141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
      [141625.580803] Call Trace:
      [141625.583345]  <IRQ>  [<ffffffff8106e488>] sched_show_task+0xc1/0xc6
      [141625.589650]  [<ffffffff810702b0>] dump_cpu_task+0x35/0x39
      [141625.595144]  [<ffffffff8108df9f>] rcu_dump_cpu_stacks+0x6a/0x8c
      [141625.601320]  [<ffffffff81090606>] rcu_check_callbacks+0x1f6/0x5d4
      [141625.607669]  [<ffffffff810940c8>] update_process_times+0x2a/0x4f
      [141625.613925]  [<ffffffff8109fbee>] tick_sched_handle+0x32/0x3e
      [141625.619923]  [<ffffffff8109fc2f>] tick_sched_timer+0x35/0x5c
      [141625.625830]  [<ffffffff81094a1f>] __hrtimer_run_queues+0x8f/0x18d
      [141625.632171]  [<ffffffff81094c9e>] hrtimer_interrupt+0xa0/0x166
      [141625.638258]  [<ffffffff8102bf2a>] local_apic_timer_interrupt+0x4e/0x52
      [141625.645036]  [<ffffffff8102c36f>] smp_apic_timer_interrupt+0x39/0x4a
      [141625.651643]  [<ffffffff8140b9e8>] apic_timer_interrupt+0x68/0x70
      [141625.657895]  <EOI>  [<ffffffff81346ee8>] ? dst_destroy+0x7c/0xb5
      [141625.664188]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
      [141625.670272]  [<ffffffff81082b45>] ? queue_write_lock_slowpath+0x60/0x6f
      [141625.677140]  [<ffffffff8140aa33>] _raw_write_lock_bh+0x23/0x25
      [141625.683218]  [<ffffffff813d4553>] __fib6_clean_all+0x40/0x82
      [141625.689124]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
      [141625.695207]  [<ffffffff813d6058>] fib6_clean_all+0xe/0x10
      [141625.700854]  [<ffffffff813d60d3>] fib6_run_gc+0x79/0xc8
      [141625.706329]  [<ffffffff813d0510>] ip6_dst_gc+0x85/0xf9
      [141625.711718]  [<ffffffff81346d68>] dst_alloc+0x55/0x159
      [141625.717105]  [<ffffffff813d09b5>] __ip6_dst_alloc.isra.32+0x19/0x63
      [141625.723620]  [<ffffffff813d1830>] ip6_pol_route+0x36a/0x3e8
      [141625.729441]  [<ffffffff813d18d6>] ip6_pol_route_output+0x11/0x13
      [141625.735700]  [<ffffffff813f02c8>] fib6_rule_action+0xa7/0x1bf
      [141625.741698]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
      [141625.748043]  [<ffffffff81357c48>] fib_rules_lookup+0xb5/0x12a
      [141625.754050]  [<ffffffff81141628>] ? poll_select_copy_remaining+0xf9/0xf9
      [141625.761002]  [<ffffffff813f0535>] fib6_rule_lookup+0x37/0x5c
      [141625.766914]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
      [141625.773260]  [<ffffffff813d008c>] ip6_route_output+0x7a/0x82
      [141625.779177]  [<ffffffff813c44c8>] ip6_dst_lookup_tail+0x53/0x112
      [141625.785437]  [<ffffffff813c45c3>] ip6_dst_lookup_flow+0x2a/0x6b
      [141625.791604]  [<ffffffff813ddaab>] rawv6_sendmsg+0x407/0x9b6
      [141625.797423]  [<ffffffff813d7914>] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
      [141625.804464]  [<ffffffff8139d4b4>] inet_sendmsg+0x57/0x8e
      [141625.810028]  [<ffffffff81329ba3>] sock_sendmsg+0x2e/0x3c
      [141625.815588]  [<ffffffff8132be57>] SyS_sendto+0xfe/0x143
      [141625.821063]  [<ffffffff813dd551>] ? rawv6_setsockopt+0x5e/0x67
      [141625.827146]  [<ffffffff8132c9f8>] ? sock_common_setsockopt+0xf/0x11
      [141625.833660]  [<ffffffff8132c08c>] ? SyS_setsockopt+0x81/0xa2
      [141625.839565]  [<ffffffff8140ac17>] entry_SYSCALL_64_fastpath+0x12/0x6a
      
      Fixes: d52d3997 ("pv6: Create percpu rt6_info")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c7370a1
    • M
      ipv6: Add rt6_make_pcpu_route() · a73e4195
      Martin KaFai Lau 提交于
      It is a prep work for fixing a potential deadlock when creating
      a pcpu rt.
      
      The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
      exist.  This patch moves the pcpu rt creation logic into another function,
      rt6_make_pcpu_route().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a73e4195
    • M
      ipv6: Remove un-used argument from ip6_dst_alloc() · ad706862
      Martin KaFai Lau 提交于
      After 4b32b5ad ("ipv6: Stop rt6_info from using inet_peer's metrics"),
      ip6_dst_alloc() does not need the 'table' argument.  This patch
      cleans it up.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad706862
  13. 14 8月, 2015 2 次提交
    • A
      net: ipv6 sysctl option to ignore routes when nexthop link is down · 35103d11
      Andy Gospodarek 提交于
      Like the ipv4 patch with a similar title, this adds a sysctl to allow
      the user to change routing behavior based on whether or not the
      interface associated with the nexthop was an up or down link.  The
      default setting preserves the current behavior, but anyone that enables
      it will notice that nexthops on down interfaces will no longer be
      selected:
      
      net.ipv6.conf.all.ignore_routes_with_linkdown = 0
      net.ipv6.conf.default.ignore_routes_with_linkdown = 0
      net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, not only will link status be reported to
      userspace, but an indication that a nexthop is dead and will not be used
      is also reported.
      
      1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
      7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      8000::/8 dev p8p1  proto kernel  metric 256  pref medium
      9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
      9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      fe80::/64 dev p8p1  proto kernel  metric 256  pref medium
      
      This also adds devconf support and notification when sysctl values
      change.
      
      v2: drop use of rt6i_nhflags since it is not needed right now
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35103d11
    • A
      net: track link status of ipv6 nexthops · cea45e20
      Andy Gospodarek 提交于
      Add support to track current link status of ipv6 nexthops to match
      recent changes that added support for ipv4 nexthops.  This takes a
      simple approach to track linkdown status for next-hops and simply
      checks the dev for the dst entry and sets proper flags that to be used
      in the netlink message.
      
      v2: drop use of rt6i_nhflags since it is not needed right now
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea45e20
  14. 11 8月, 2015 1 次提交
  15. 27 7月, 2015 5 次提交
  16. 22 7月, 2015 2 次提交
  17. 04 7月, 2015 1 次提交
  18. 26 5月, 2015 7 次提交