1. 05 2月, 2017 2 次提交
    • D
      net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute · beb1afac
      David Ahern 提交于
      IPv6 returns multipath routes as a series of individual routes making
      their display and handling by userspace different and more complicated
      than IPv4, putting the burden on the user to see that a route is part of
      a multipath route and internally creating a multipath route if desired
      (e.g., libnl does this as of commit 29b71371e764). This patch addresses
      this difference, allowing multipath routes to be returned using the
      RTA_MULTIPATH attribute.
      
      The end result is that IPv6 multipath routes can be treated and displayed
      in a format similar to IPv4:
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 metric 1024
      	    nexthop via 2001:db8:1::2  dev eth1 weight 1
      	    nexthop via 2001:db8:2::2  dev eth2 weight 1
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      beb1afac
    • D
      net: ipv6: Allow shorthand delete of all nexthops in multipath route · 0ae81335
      David Ahern 提交于
      IPv4 allows multipath routes to be deleted using just the prefix and
      length. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
              nexthop via 10.100.1.254  dev eth1 weight 1
              nexthop via 10.11.200.2  dev eth11.200 weight 1
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
          $ ip ro del 1.1.1.0/24 vrf red
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
      The same notation does not work with IPv6 because of how multipath routes
      are implemented for IPv6. For IPv6 only the first nexthop of a multipath
      route is deleted if the request contains only a prefix and length. This
      leads to unnecessary complexity in userspace dealing with IPv6 multipath
      routes.
      
      This patch allows all nexthops to be deleted without specifying each one
      in the delete request. Internally, this is done by walking the sibling
      list of the route matching the specifications given (prefix, length,
      metric, protocol, etc).
      
          $  ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
          2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
          ...
      
          $ ip -6 ro del vrf red 2001:db8:200::/120
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          ...
      
      Because IPv6 allows individual nexthops to be deleted without deleting
      the entire route, the ip6_route_multipath_del and non-multipath code
      path (ip6_route_del) have to be discriminated so that all nexthops are
      only deleted for the latter case. This is done by making the existing
      fc_type in fib6_config a u16 and then adding a new u16 field with
      fc_delete_all_nh as the first bit.
      Suggested-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ae81335
  2. 04 2月, 2017 1 次提交
    • D
      net: ipv6: Set protocol to kernel for local routes · 94b5e0f9
      David Ahern 提交于
      IPv6 stack does not set the protocol for local routes, so those routes show
      up with proto "none":
          $ ip -6 ro ls table local
          local ::1 dev lo proto none metric 0  pref medium
          local 2100:3:: dev lo proto none metric 0  pref medium
          local 2100:3::4 dev lo proto none metric 0  pref medium
          local fe80:: dev lo proto none metric 0  pref medium
          ...
      
      Set rt6i_protocol to RTPROT_KERNEL for consistency with IPv4. Now routes
      show up with proto "kernel":
          $ ip -6 ro ls table local
          local ::1 dev lo proto kernel metric 0  pref medium
          local 2100:3:: dev lo proto kernel metric 0  pref medium
          local 2100:3::4 dev lo proto kernel metric 0  pref medium
          local fe80:: dev lo proto kernel metric 0  pref medium
          ...
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94b5e0f9
  3. 31 1月, 2017 1 次提交
  4. 27 1月, 2017 2 次提交
    • D
      net: ipv6: ignore null_entry on route dumps · 1f17e2f2
      David Ahern 提交于
      lkp-robot reported a BUG:
      [   10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198
      [   10.152525] IP: rt6_fill_node+0x164/0x4b8
      [   10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000
      [   10.153309]
      [   10.154492] Oops: 0000 [#1]
      [   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70e-dirty #10
      [   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   10.158254] task: d0deb000 task.stack: d0e0c000
      [   10.159059] EIP: rt6_fill_node+0x164/0x4b8
      [   10.159780] EFLAGS: 00010296 CPU: 0
      [   10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
      [   10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
      [   10.162534]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
      [   10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0
      [   10.164535] Call Trace:
      [   10.164993]  ? paravirt_sched_clock+0x9/0xd
      [   10.165727]  ? sched_clock+0x9/0xc
      [   10.166329]  ? sched_clock_cpu+0x19/0xe9
      [   10.166991]  ? lock_release+0x13e/0x36c
      [   10.167652]  rt6_dump_route+0x4c/0x56
      [   10.168276]  fib6_dump_node+0x1d/0x3d
      [   10.168913]  fib6_walk_continue+0xab/0x167
      [   10.169611]  fib6_walk+0x2a/0x40
      [   10.170182]  inet6_dump_fib+0xfb/0x1e0
      [   10.170855]  netlink_dump+0xcd/0x21f
      
      This happens when the loopback device is set down and a ipv6 fib route
      dump is requested.
      
      ip6_null_entry is the root of all ipv6 fib tables making it integrated
      into the table and hence passed to the ipv6 route dump code. The
      null_entry route uses the loopback device for dst.dev but may not have
      rt6i_idev set because of the order in which initializations are done --
      ip6_route_net_init is run before addrconf_init has initialized the
      loopback device. Fixing the initialization order is a much bigger problem
      with no obvious solution thus far.
      
      The BUG is triggered when the loopback is set down and the netif_running
      check added by a1a22c12 fails. The fill_node descends to checking
      rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is
      NULL it faults.
      
      The null_entry route should not be processed in a dump request. Catch
      and ignore. This check is done in rt6_dump_route as it is the highest
      place in the callchain with knowledge of both the route and the network
      namespace.
      
      Fixes: a1a22c12("net: ipv6: Keep nexthop of multipath route on admin down")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f17e2f2
    • D
      net: ipv6: remove skb_reserve in getroute · 3b7b2b0a
      David Ahern 提交于
      Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The
      allocated skb is not passed through the routing engine (like it is for
      IPv4) and has not since the beginning of git time.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b7b2b0a
  5. 20 1月, 2017 1 次提交
    • D
      net: ipv6: Keep nexthop of multipath route on admin down · a1a22c12
      David Ahern 提交于
      IPv6 deletes route entries associated with multipath routes on an
      admin down where IPv4 does not. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24 metric 64
                  nexthop via 10.100.1.254  dev eth1 weight 1
                  nexthop via 10.100.2.254  dev eth2 weight 1
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Set link down:
          $ ip li set eth1 down
      
      IPv4 retains the multihop route but flags eth1 route as dead:
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
                  nexthop via 10.100.1.16  dev eth1 weight 1 dead linkdown
                  nexthop via 10.100.2.16  dev eth2 weight 1
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
      and IPv6 deletes the route as part of flushing all routes for the device:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Worse, on admin up of the device the multipath route has to be deleted
      to get this leg of the route re-added.
      
      This patch keeps routes that are part of a multipath route if
      ignore_routes_with_linkdown is set with the dead and linkdown flags
      enabling consistency between IPv4 and IPv6:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a22c12
  6. 19 1月, 2017 3 次提交
    • D
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern 提交于
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed59592
    • D
      net: ipv6: remove prefix arg to rt6_fill_node · f8cfe2ce
      David Ahern 提交于
      The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route
      where a user is requesting a prefix only dump. Simplify rt6_fill_node
      by removing the prefix arg and moving the prefix check to rt6_dump_route.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8cfe2ce
    • D
      net: ipv6: remove nowait arg to rt6_fill_node · fd61c6ba
      David Ahern 提交于
      All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
      simplify rt6_fill_node accordingly.
      
      rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
      nowait arg from it as well.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd61c6ba
  7. 13 1月, 2017 1 次提交
  8. 10 1月, 2017 1 次提交
  9. 25 12月, 2016 1 次提交
  10. 18 12月, 2016 1 次提交
  11. 06 12月, 2016 1 次提交
    • E
      ipv6: Allow IPv4-mapped address as next-hop · 96d5822c
      Erik Nordmark 提交于
      Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.
      
      It is possible to configure IP interfaces with IPv4-mapped addresses, and
      one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
      to this fix the kernel returned an EINVAL when attempting to add an IPv6
      route with an IPv4-mapped address as a nexthop/gateway.
      
      RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
      thus in order to support that type of address configuration the kernel
      needs to allow IPv4-mapped addresses as nexthops.
      Signed-off-by: NErik Nordmark <nordmark@arista.com>
      Signed-off-by: NBob Gilligan <gilligan@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96d5822c
  12. 04 12月, 2016 1 次提交
  13. 10 11月, 2016 1 次提交
    • M
      net-ipv6: on device mtu change do not add mtu to mtu-less routes · fb56be83
      Maciej Żenczykowski 提交于
      Routes can specify an mtu explicitly or inherit the mtu from
      the underlying device - this inheritance is implemented in
      dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
      
      Currently changing the mtu of a device adds mtu explicitly
      to routes using that device.
      
      ie.
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
      
      After this patch the route entry no longer changes unless it already has an mtu.
      There is no need: this inheritance is already done in ip6_mtu()
      
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route add local 2000::2 dev lo mtu 2000
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 1501
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
        # ip -6 route del local 2000::2
      
      This is desirable because changing device mtu and then resetting it
      to the previous value shouldn't change the user visible routing table.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb56be83
  14. 05 11月, 2016 2 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
    • L
      net: core: add UID to flows, rules, and routes · 622ec2c9
      Lorenzo Colitti 提交于
      - Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
        range of UIDs.
      - Define a RTA_UID attribute for per-UID route lookups and dumps.
      - Support passing these attributes to and from userspace via
        rtnetlink. The value INVALID_UID indicates no UID was
        specified.
      - Add a UID field to the flow structures.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622ec2c9
  15. 01 11月, 2016 1 次提交
  16. 28 10月, 2016 2 次提交
    • D
      net: ipv6: Do not consider link state for nexthop validation · d5d32e4b
      David Ahern 提交于
      Similar to IPv4, do not consider link state when validating next hops.
      
      Currently, if the link is down default routes can fail to insert:
       $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
       RTNETLINK answers: No route to host
      
      With this patch the command succeeds.
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5d32e4b
    • D
      net: ipv6: Fix processing of RAs in presence of VRF · 830218c1
      David Ahern 提交于
      rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
      table from the device index, but the corresponding rt6_get_route_info
      and rt6_get_dflt_router functions were not leading to the failure to
      process RA's:
      
          ICMPv6: RA: ndisc_router_discovery failed to add default route
      
      Fix the 'get' functions by using the table id associated with the
      device when applicable.
      
      Also, now that default routes can be added to tables other than the
      default table, rt6_purge_dflt_routers needs to be updated as well to
      look at all tables. To handle that efficiently, add a flag to the table
      denoting if it is has a default route via RA.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      830218c1
  17. 26 9月, 2016 1 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
  18. 20 9月, 2016 1 次提交
    • V
      net: ipv6: fallback to full lookup if table lookup is unsuitable · a435a07f
      Vincent Bernat 提交于
      Commit 8c14586f ("net: ipv6: Use passed in table for nexthop
      lookups") introduced a regression: insertion of an IPv6 route in a table
      not containing the appropriate connected route for the gateway but which
      contained a non-connected route (like a default gateway) fails while it
      was previously working:
      
          $ ip link add eth0 type dummy
          $ ip link set up dev eth0
          $ ip addr add 2001:db8::1/64 dev eth0
          $ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
          $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
          RTNETLINK answers: No route to host
          $ ip -6 route show table 20
          default via 2001:db8::5 dev eth0  metric 1024  pref medium
      
      After this patch, we get:
      
          $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
          $ ip -6 route show table 20
          2001:db8:cafe::1 via 2001:db8::6 dev eth0  metric 1024  pref medium
          default via 2001:db8::5 dev eth0  metric 1024  pref medium
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a435a07f
  19. 19 9月, 2016 1 次提交
  20. 11 9月, 2016 3 次提交
  21. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd
  22. 27 6月, 2016 1 次提交
    • P
      ipv6: enforce egress device match in per table nexthop lookups · 48f1dcb5
      Paolo Abeni 提交于
      with the commit 8c14586f ("net: ipv6: Use passed in table for
      nexthop lookups"), net hop lookup is first performed on route creation
      in the passed-in table.
      However device match is not enforced in table lookup, so the found
      route can be later discarded due to egress device mismatch and no
      global lookup will be performed.
      This cause the following to fail:
      
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link set dummy1 up
      ip link set dummy2 up
      ip route add 2001:db8:8086::/48 dev dummy1 metric 20
      ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
      ip route add 2001:db8:8086::/48 dev dummy2 metric 21
      ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
      RTNETLINK answers: No route to host
      
      This change fixes the issue enforcing device lookup in
      ip6_nh_lookup_table()
      
      v1->v2: updated commit message title
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Reported-and-tested-by: NBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48f1dcb5
  23. 18 6月, 2016 1 次提交
  24. 16 6月, 2016 2 次提交
    • A
      ipv6: introduce neighbour discovery ops · f997c55c
      Alexander Aring 提交于
      This patch introduces neighbour discovery ops callback structure. The
      idea is to separate the handling for 6LoWPAN into the 6lowpan module.
      
      These callback offers 6lowpan different handling, such as 802.15.4 short
      address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
      over 6LoWPANs).
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NAlexander Aring <aar@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f997c55c
    • D
      net: vrf: Handle ipv6 multicast and link-local addresses · 9ff74384
      David Ahern 提交于
      IPv6 multicast and link-local addresses require special handling by the
      VRF driver:
      1. Rather than using the VRF device index and full FIB lookups,
         packets to/from these addresses should use direct FIB lookups based on
         the VRF device table.
      
      2. fail sends/receives on a VRF device to/from a multicast address
         (e.g, make ping6 ff02::1%<vrf> fail)
      
      3. move the setting of the flow oif to the first dst lookup and revert
         the change in icmpv6_echo_reply made in ca254490 ("net: Add VRF
         support to IPv6 stack"). Linklocal/mcast addresses require use of the
         skb->dev.
      
      With this change connections into and out of a VRF enslaved device work
      for multicast and link-local addresses work (icmp, tcp, and udp)
      e.g.,
      
      1. packets into VM with VRF config:
          ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
          ping6 -c3 ff02::1%br1
      
          ssh -6 fe80::e0:f9ff:fe1c:b974%br1
      
      2. packets going out a VRF enslaved device:
          ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
          ping6 -c3 ff02::1%eth1
          ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ff74384
  25. 12 6月, 2016 1 次提交
  26. 15 5月, 2016 1 次提交
    • P
      net/route: enforce hoplimit max value · 626abd59
      Paolo Abeni 提交于
      Currently, when creating or updating a route, no check is performed
      in both ipv4 and ipv6 code to the hoplimit value.
      
      The caller can i.e. set hoplimit to 256, and when such route will
       be used, packets will be sent with hoplimit/ttl equal to 0.
      
      This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
      ipv6 route code, substituting any value greater than 255 with 255.
      
      This is consistent with what is currently done for ADVMSS and MTU
      in the ipv4 code.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626abd59
  27. 10 5月, 2016 1 次提交
  28. 28 4月, 2016 1 次提交
    • D
      net: ipv6: Use passed in table for nexthop lookups · 8c14586f
      David Ahern 提交于
      Similar to 3bfd8472 ("net: Use passed in table for nexthop lookups")
      for IPv4, if the route spec contains a table id use that to lookup the
      next hop first and fall back to a full lookup if it fails (per the fix
      4c9bcd11 ("net: Fix nexthop lookups")).
      
      Example:
      
          root@kenny:~# ip -6 ro ls table red
          local 2100:1::1 dev lo  proto none  metric 0  pref medium
          2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
          local 2100:2::1 dev lo  proto none  metric 0  pref medium
          2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
          local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
          local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
          fe80::/64 dev eth1  proto kernel  metric 256  pref medium
          fe80::/64 dev eth2  proto kernel  metric 256  pref medium
          ff00::/8 dev red  metric 256  pref medium
          ff00::/8 dev eth1  metric 256  pref medium
          ff00::/8 dev eth2  metric 256  pref medium
          unreachable default dev lo  metric 240  error -113 pref medium
      
          root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
          RTNETLINK answers: No route to host
      
      Route add fails even though 2100:1::64 is a reachable next hop:
          root@kenny:~# ping6 -I red  2100:1::64
          ping6: Warning: source address might be selected on device other than red.
          PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes
          64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 ms
      
      With this patch:
          root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
          root@kenny:~# ip -6 ro ls table red
          local 2100:1::1 dev lo  proto none  metric 0  pref medium
          2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
          local 2100:2::1 dev lo  proto none  metric 0  pref medium
          2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
          2100:3::/64 via 2100:1::64 dev eth1  metric 1024  pref medium
          local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
          local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
          fe80::/64 dev eth1  proto kernel  metric 256  pref medium
          fe80::/64 dev eth2  proto kernel  metric 256  pref medium
          ff00::/8 dev red  metric 256  pref medium
          ff00::/8 dev eth1  metric 256  pref medium
          ff00::/8 dev eth2  metric 256  pref medium
          unreachable default dev lo  metric 240  error -113 pref medium
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c14586f
  29. 15 4月, 2016 1 次提交
    • M
      ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update · 33c162a9
      Martin KaFai Lau 提交于
      There is a case in connected UDP socket such that
      getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
      sequence could be the following:
      1. Create a connected UDP socket
      2. Send some datagrams out
      3. Receive a ICMPV6_PKT_TOOBIG
      4. No new outgoing datagrams to trigger the sk_dst_check()
         logic to update the sk->sk_dst_cache.
      5. getsockopt(IPV6_MTU) returns the mtu from the invalid
         sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
      
      This patch updates the sk->sk_dst_cache for a connected datagram sk
      during pmtu-update code path.
      
      Note that the sk->sk_v6_daddr is used to do the route lookup
      instead of skb->data (i.e. iph).  It is because a UDP socket can become
      connected after sending out some datagrams in un-connected state.  or
      It can be connected multiple times to different destinations.  Hence,
      iph may not be related to where sk is currently connected to.
      
      It is done under '!sock_owned_by_user(sk)' condition because
      the user may make another ip6_datagram_connect()  (i.e changing
      the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
      code path.
      
      For the sock_owned_by_user(sk) == true case, the next patch will
      introduce a release_cb() which will update the sk->sk_dst_cache.
      
      Test:
      
      Server (Connected UDP Socket):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Route Details:
      [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
      2fac::/64 dev eth0  proto kernel  metric 256  pref medium
      2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium
      
      A simple python code to create a connected UDP socket:
      
      import socket
      import errno
      
      HOST = '2fac::1'
      PORT = 8080
      
      s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
      s.bind((HOST, PORT))
      s.connect(('2fac:face::face', 53))
      print("connected")
      while True:
          try:
      	data = s.recv(1024)
          except socket.error as se:
      	if se.errno == errno.EMSGSIZE:
      		pmtu = s.getsockopt(41, 24)
      		print("PMTU:%d" % pmtu)
      		break
      s.close()
      
      Python program output after getting a ICMPV6_PKT_TOOBIG:
      [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
      connected
      PMTU:1300
      
      Cache routes after recieving TOOBIG:
      [root@arch-fb-vm1 ~]# ip -6 r show table cache
      2fac:face::face via 2fac::face dev eth0  metric 0
          cache  expires 463sec mtu 1300 pref medium
      
      Client (Send the ICMPV6_PKT_TOOBIG):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      scapy is used to generate the TOOBIG message.  Here is the scapy script I have
      used:
      
      >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
      1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
      >>> sendp(p, iface='qemubr0')
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c162a9
  30. 12 4月, 2016 1 次提交
    • D
      net: vrf: Fix dst reference counting · 9ab179d8
      David Ahern 提交于
      Vivek reported a kernel exception deleting a VRF with an active
      connection through it. The root cause is that the socket has a cached
      reference to a dst that is destroyed. Converting the dst_destroy to
      dst_release and letting proper reference counting kick in does not
      work as the dst has a reference to the device which needs to be released
      as well.
      
      I talked to Hannes about this at netdev and he pointed out the ipv4 and
      ipv6 dst handling has dst_ifdown for just this scenario. Rather than
      continuing with the reinvented dst wheel in VRF just remove it and
      leverage the ipv4 and ipv6 versions.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab179d8
  31. 30 1月, 2016 1 次提交
    • P
      ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() · 6f21c96a
      Paolo Abeni 提交于
      The current implementation of ip6_dst_lookup_tail basically
      ignore the egress ifindex match: if the saddr is set,
      ip6_route_output() purposefully ignores flowi6_oif, due
      to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
      flag if saddr set"), if the saddr is 'any' the first route lookup
      in ip6_dst_lookup_tail fails, but upon failure a second lookup will
      be performed with saddr set, thus ignoring the ifindex constraint.
      
      This commit adds an output route lookup function variant, which
      allows the caller to specify lookup flags, and modify
      ip6_dst_lookup_tail() to enforce the ifindex match on the second
      lookup via said helper.
      
      ip6_route_output() becames now a static inline function build on
      top of ip6_route_output_flags(); as a side effect, out-of-tree
      modules need now a GPL license to access the output route lookup
      functionality.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f21c96a