1. 28 10月, 2016 1 次提交
    • D
      net: ipv6: Fix processing of RAs in presence of VRF · 830218c1
      David Ahern 提交于
      rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
      table from the device index, but the corresponding rt6_get_route_info
      and rt6_get_dflt_router functions were not leading to the failure to
      process RA's:
      
          ICMPv6: RA: ndisc_router_discovery failed to add default route
      
      Fix the 'get' functions by using the table id associated with the
      device when applicable.
      
      Also, now that default routes can be added to tables other than the
      default table, rt6_purge_dflt_routers needs to be updated as well to
      look at all tables. To handle that efficiently, add a flag to the table
      denoting if it is has a default route via RA.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      830218c1
  2. 26 9月, 2016 1 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
  3. 20 9月, 2016 1 次提交
    • V
      net: ipv6: fallback to full lookup if table lookup is unsuitable · a435a07f
      Vincent Bernat 提交于
      Commit 8c14586f ("net: ipv6: Use passed in table for nexthop
      lookups") introduced a regression: insertion of an IPv6 route in a table
      not containing the appropriate connected route for the gateway but which
      contained a non-connected route (like a default gateway) fails while it
      was previously working:
      
          $ ip link add eth0 type dummy
          $ ip link set up dev eth0
          $ ip addr add 2001:db8::1/64 dev eth0
          $ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
          $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
          RTNETLINK answers: No route to host
          $ ip -6 route show table 20
          default via 2001:db8::5 dev eth0  metric 1024  pref medium
      
      After this patch, we get:
      
          $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
          $ ip -6 route show table 20
          2001:db8:cafe::1 via 2001:db8::6 dev eth0  metric 1024  pref medium
          default via 2001:db8::5 dev eth0  metric 1024  pref medium
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a435a07f
  4. 19 9月, 2016 1 次提交
  5. 11 9月, 2016 3 次提交
  6. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd
  7. 27 6月, 2016 1 次提交
    • P
      ipv6: enforce egress device match in per table nexthop lookups · 48f1dcb5
      Paolo Abeni 提交于
      with the commit 8c14586f ("net: ipv6: Use passed in table for
      nexthop lookups"), net hop lookup is first performed on route creation
      in the passed-in table.
      However device match is not enforced in table lookup, so the found
      route can be later discarded due to egress device mismatch and no
      global lookup will be performed.
      This cause the following to fail:
      
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link set dummy1 up
      ip link set dummy2 up
      ip route add 2001:db8:8086::/48 dev dummy1 metric 20
      ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
      ip route add 2001:db8:8086::/48 dev dummy2 metric 21
      ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
      RTNETLINK answers: No route to host
      
      This change fixes the issue enforcing device lookup in
      ip6_nh_lookup_table()
      
      v1->v2: updated commit message title
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Reported-and-tested-by: NBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48f1dcb5
  8. 18 6月, 2016 1 次提交
  9. 16 6月, 2016 2 次提交
    • A
      ipv6: introduce neighbour discovery ops · f997c55c
      Alexander Aring 提交于
      This patch introduces neighbour discovery ops callback structure. The
      idea is to separate the handling for 6LoWPAN into the 6lowpan module.
      
      These callback offers 6lowpan different handling, such as 802.15.4 short
      address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
      over 6LoWPANs).
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NAlexander Aring <aar@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f997c55c
    • D
      net: vrf: Handle ipv6 multicast and link-local addresses · 9ff74384
      David Ahern 提交于
      IPv6 multicast and link-local addresses require special handling by the
      VRF driver:
      1. Rather than using the VRF device index and full FIB lookups,
         packets to/from these addresses should use direct FIB lookups based on
         the VRF device table.
      
      2. fail sends/receives on a VRF device to/from a multicast address
         (e.g, make ping6 ff02::1%<vrf> fail)
      
      3. move the setting of the flow oif to the first dst lookup and revert
         the change in icmpv6_echo_reply made in ca254490 ("net: Add VRF
         support to IPv6 stack"). Linklocal/mcast addresses require use of the
         skb->dev.
      
      With this change connections into and out of a VRF enslaved device work
      for multicast and link-local addresses work (icmp, tcp, and udp)
      e.g.,
      
      1. packets into VM with VRF config:
          ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
          ping6 -c3 ff02::1%br1
      
          ssh -6 fe80::e0:f9ff:fe1c:b974%br1
      
      2. packets going out a VRF enslaved device:
          ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
          ping6 -c3 ff02::1%eth1
          ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ff74384
  10. 12 6月, 2016 1 次提交
  11. 15 5月, 2016 1 次提交
    • P
      net/route: enforce hoplimit max value · 626abd59
      Paolo Abeni 提交于
      Currently, when creating or updating a route, no check is performed
      in both ipv4 and ipv6 code to the hoplimit value.
      
      The caller can i.e. set hoplimit to 256, and when such route will
       be used, packets will be sent with hoplimit/ttl equal to 0.
      
      This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
      ipv6 route code, substituting any value greater than 255 with 255.
      
      This is consistent with what is currently done for ADVMSS and MTU
      in the ipv4 code.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626abd59
  12. 10 5月, 2016 1 次提交
  13. 28 4月, 2016 1 次提交
    • D
      net: ipv6: Use passed in table for nexthop lookups · 8c14586f
      David Ahern 提交于
      Similar to 3bfd8472 ("net: Use passed in table for nexthop lookups")
      for IPv4, if the route spec contains a table id use that to lookup the
      next hop first and fall back to a full lookup if it fails (per the fix
      4c9bcd11 ("net: Fix nexthop lookups")).
      
      Example:
      
          root@kenny:~# ip -6 ro ls table red
          local 2100:1::1 dev lo  proto none  metric 0  pref medium
          2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
          local 2100:2::1 dev lo  proto none  metric 0  pref medium
          2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
          local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
          local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
          fe80::/64 dev eth1  proto kernel  metric 256  pref medium
          fe80::/64 dev eth2  proto kernel  metric 256  pref medium
          ff00::/8 dev red  metric 256  pref medium
          ff00::/8 dev eth1  metric 256  pref medium
          ff00::/8 dev eth2  metric 256  pref medium
          unreachable default dev lo  metric 240  error -113 pref medium
      
          root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
          RTNETLINK answers: No route to host
      
      Route add fails even though 2100:1::64 is a reachable next hop:
          root@kenny:~# ping6 -I red  2100:1::64
          ping6: Warning: source address might be selected on device other than red.
          PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes
          64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 ms
      
      With this patch:
          root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
          root@kenny:~# ip -6 ro ls table red
          local 2100:1::1 dev lo  proto none  metric 0  pref medium
          2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
          local 2100:2::1 dev lo  proto none  metric 0  pref medium
          2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
          2100:3::/64 via 2100:1::64 dev eth1  metric 1024  pref medium
          local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
          local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
          fe80::/64 dev eth1  proto kernel  metric 256  pref medium
          fe80::/64 dev eth2  proto kernel  metric 256  pref medium
          ff00::/8 dev red  metric 256  pref medium
          ff00::/8 dev eth1  metric 256  pref medium
          ff00::/8 dev eth2  metric 256  pref medium
          unreachable default dev lo  metric 240  error -113 pref medium
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c14586f
  14. 15 4月, 2016 1 次提交
    • M
      ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update · 33c162a9
      Martin KaFai Lau 提交于
      There is a case in connected UDP socket such that
      getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
      sequence could be the following:
      1. Create a connected UDP socket
      2. Send some datagrams out
      3. Receive a ICMPV6_PKT_TOOBIG
      4. No new outgoing datagrams to trigger the sk_dst_check()
         logic to update the sk->sk_dst_cache.
      5. getsockopt(IPV6_MTU) returns the mtu from the invalid
         sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
      
      This patch updates the sk->sk_dst_cache for a connected datagram sk
      during pmtu-update code path.
      
      Note that the sk->sk_v6_daddr is used to do the route lookup
      instead of skb->data (i.e. iph).  It is because a UDP socket can become
      connected after sending out some datagrams in un-connected state.  or
      It can be connected multiple times to different destinations.  Hence,
      iph may not be related to where sk is currently connected to.
      
      It is done under '!sock_owned_by_user(sk)' condition because
      the user may make another ip6_datagram_connect()  (i.e changing
      the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
      code path.
      
      For the sock_owned_by_user(sk) == true case, the next patch will
      introduce a release_cb() which will update the sk->sk_dst_cache.
      
      Test:
      
      Server (Connected UDP Socket):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Route Details:
      [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
      2fac::/64 dev eth0  proto kernel  metric 256  pref medium
      2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium
      
      A simple python code to create a connected UDP socket:
      
      import socket
      import errno
      
      HOST = '2fac::1'
      PORT = 8080
      
      s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
      s.bind((HOST, PORT))
      s.connect(('2fac:face::face', 53))
      print("connected")
      while True:
          try:
      	data = s.recv(1024)
          except socket.error as se:
      	if se.errno == errno.EMSGSIZE:
      		pmtu = s.getsockopt(41, 24)
      		print("PMTU:%d" % pmtu)
      		break
      s.close()
      
      Python program output after getting a ICMPV6_PKT_TOOBIG:
      [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
      connected
      PMTU:1300
      
      Cache routes after recieving TOOBIG:
      [root@arch-fb-vm1 ~]# ip -6 r show table cache
      2fac:face::face via 2fac::face dev eth0  metric 0
          cache  expires 463sec mtu 1300 pref medium
      
      Client (Send the ICMPV6_PKT_TOOBIG):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      scapy is used to generate the TOOBIG message.  Here is the scapy script I have
      used:
      
      >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
      1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
      >>> sendp(p, iface='qemubr0')
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c162a9
  15. 12 4月, 2016 1 次提交
    • D
      net: vrf: Fix dst reference counting · 9ab179d8
      David Ahern 提交于
      Vivek reported a kernel exception deleting a VRF with an active
      connection through it. The root cause is that the socket has a cached
      reference to a dst that is destroyed. Converting the dst_destroy to
      dst_release and letting proper reference counting kick in does not
      work as the dst has a reference to the device which needs to be released
      as well.
      
      I talked to Hannes about this at netdev and he pointed out the ipv4 and
      ipv6 dst handling has dst_ifdown for just this scenario. Rather than
      continuing with the reinvented dst wheel in VRF just remove it and
      leverage the ipv4 and ipv6 versions.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab179d8
  16. 30 1月, 2016 1 次提交
    • P
      ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() · 6f21c96a
      Paolo Abeni 提交于
      The current implementation of ip6_dst_lookup_tail basically
      ignore the egress ifindex match: if the saddr is set,
      ip6_route_output() purposefully ignores flowi6_oif, due
      to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
      flag if saddr set"), if the saddr is 'any' the first route lookup
      in ip6_dst_lookup_tail fails, but upon failure a second lookup will
      be performed with saddr set, thus ignoring the ifindex constraint.
      
      This commit adds an output route lookup function variant, which
      allows the caller to specify lookup flags, and modify
      ip6_dst_lookup_tail() to enforce the ifindex match on the second
      lookup via said helper.
      
      ip6_route_output() becames now a static inline function build on
      top of ip6_route_output_flags(); as a side effect, out-of-tree
      modules need now a GPL license to access the output route lookup
      functionality.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f21c96a
  17. 18 12月, 2015 1 次提交
  18. 02 12月, 2015 1 次提交
  19. 23 11月, 2015 1 次提交
  20. 16 11月, 2015 3 次提交
    • M
      ipv6: Check rt->dst.from for the DST_NOCACHE route · 02bcf4e0
      Martin KaFai Lau 提交于
      All DST_NOCACHE rt6_info used to have rt->dst.from set to
      its parent.
      
      After commit 8e3d5be7 ("ipv6: Avoid double dst_free"),
      DST_NOCACHE is also set to rt6_info which does not have
      a parent (i.e. rt->dst.from is NULL).
      
      This patch catches the rt->dst.from == NULL case.
      
      Fixes: 8e3d5be7 ("ipv6: Avoid double dst_free")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02bcf4e0
    • M
      ipv6: Check expire on DST_NOCACHE route · 5973fb1e
      Martin KaFai Lau 提交于
      Since the expires of the DST_NOCACHE rt can be set during
      the ip6_rt_update_pmtu(), we also need to consider the expires
      value when doing ip6_dst_check().
      
      This patches creates __rt6_check_expired() to only
      check the expire value (if one exists) of the current rt.
      
      In rt6_dst_from_check(), it adds __rt6_check_expired() as
      one of the condition check.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5973fb1e
    • M
      ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree · 0d3f6d29
      Martin KaFai Lau 提交于
      The original bug report:
      https://bugzilla.redhat.com/show_bug.cgi?id=1272571
      
      The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
      happens when ndisc starts sending router solicitation at the gre
      interface.  The simplified oops stack is like:
      
      __lock_acquire+0x1b2/0x1c30
      lock_acquire+0xb9/0x140
      _raw_write_lock_bh+0x3f/0x50
      __ip6_ins_rt+0x2e/0x60
      ip6_ins_rt+0x49/0x50
      ~~~~~~~~
      __ip6_rt_update_pmtu.part.54+0x145/0x250
      ip6_rt_update_pmtu+0x2e/0x40
      ~~~~~~~~
      ip_tunnel_xmit+0x1f1/0xf40
      __gre_xmit+0x7a/0x90
      ipgre_xmit+0x15a/0x220
      dev_hard_start_xmit+0x2bd/0x480
      __dev_queue_xmit+0x696/0x730
      dev_queue_xmit+0x10/0x20
      neigh_direct_output+0x11/0x20
      ip6_finish_output2+0x21f/0x770
      ip6_finish_output+0xa7/0x1d0
      ip6_output+0x56/0x190
      ~~~~~~~~
      ndisc_send_skb+0x1d9/0x400
      ndisc_send_rs+0x88/0xc0
      ~~~~~~~~
      
      The rt passed to ip6_rt_update_pmtu() is created by
      icmp6_dst_alloc() and it is not managed by the fib6 tree,
      so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
      a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
      and it causes the ip6_ins_rt() oops.
      
      During pmtu update, we only want to create a RTF_CACHE clone
      from a rt which is currently managed (or owned) by the
      fib6 tree.  It means either rt->rt6i_node != NULL or
      rt is a RTF_PCPU clone.
      
      It is worth to note that rt6i_table may not be NULL even it is
      not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
      Hence, rt6i_node is a better check instead of rt6i_table.
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NChris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d3f6d29
  21. 03 11月, 2015 1 次提交
  22. 22 10月, 2015 2 次提交
  23. 16 10月, 2015 2 次提交
    • M
      ipv6: Initialize rt6_info properly in ip6_blackhole_route() · 0a1f5962
      Martin KaFai Lau 提交于
      ip6_blackhole_route() does not initialize the newly allocated
      rt6_info properly.  This patch:
      1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached
      
      2. The current rt->dst._metrics init code is incorrect:
         - 'rt->dst._metrics = ort->dst._metris' is not always safe
         - Not sure what dst_copy_metrics() is trying to do here
           considering ip6_rt_blackhole_cow_metrics() always returns
           NULL
      
         Fix:
         - Always do dst_copy_metrics()
         - Replace ip6_rt_blackhole_cow_metrics() with
           dst_cow_metrics_generic()
      
      3. Mask out the RTF_PCPU bit from the newly allocated blackhole route.
         This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie().
         It is because RTF_PCPU is set while rt->dst.from is NULL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Tested-by: NPhil Sutter <phil@nwl.cc>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Phil Sutter <phil@nwl.cc>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a1f5962
    • M
      ipv6: Move common init code for rt6_info to a new function rt6_info_init() · ebfa45f0
      Martin KaFai Lau 提交于
      Introduce rt6_info_init() to do the common init work for
      'struct rt6_info' (after calling dst_alloc).
      
      It is a prep work to fix the rt6_info init logic in the
      ip6_blackhole_route().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Phil Sutter <phil@nwl.cc>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebfa45f0
  24. 13 10月, 2015 3 次提交
  25. 08 10月, 2015 3 次提交
  26. 30 9月, 2015 1 次提交
    • D
      net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set · 741a11d9
      David Ahern 提交于
      Wolfgang reported that IPv6 stack is ignoring oif in output route lookups:
      
          With ipv6, ip -6 route get always returns the specific route.
      
          $ ip -6 r
          2001:db8:e2::1 dev enp2s0  proto kernel  metric 256
          2001:db8:e2::/64 dev enp2s0  metric 1024
          2001:db8:e3::1 dev enp3s0  proto kernel  metric 256
          2001:db8:e3::/64 dev enp3s0  metric 1024
          fe80::/64 dev enp3s0  proto kernel  metric 256
          default via 2001:db8:e3::255 dev enp3s0  metric 1024
      
          $ ip -6 r get 2001:db8:e2::100
          2001:db8:e2::100 from :: dev enp2s0  src 2001:db8:e3::1  metric 0
              cache
      
          $ ip -6 r get 2001:db8:e2::100 oif enp3s0
          2001:db8:e2::100 from :: dev enp2s0  src 2001:db8:e3::1  metric 0
              cache
      
      The stack does consider the oif but a mismatch in rt6_device_match is not
      considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags.
      
      Cc: Wolfgang Nothdurft <netdev@linux-dude.de>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      741a11d9
  27. 29 9月, 2015 1 次提交
  28. 25 9月, 2015 1 次提交
  29. 24 9月, 2015 1 次提交