1. 04 6月, 2016 1 次提交
  2. 09 4月, 2016 1 次提交
    • R
      mpls: find_outdev: check for err ptr in addition to NULL check · 94a57f1f
      Roopa Prabhu 提交于
      find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
      find the output device. In case of an error, inet{,6}_fib_lookup_dev()
      returns error pointer and dev_get_by_index() returns NULL. But the function
      only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
      This patch adds an additional check for err ptr after the NULL check.
      
      Before: Trying to add an mpls route with no oif from user, no available
      path to 10.1.1.8 and no default route:
      $ip -f mpls route add 100 as 200 via inet 10.1.1.8
      [  822.337195] BUG: unable to handle kernel NULL pointer dereference at
      00000000000003a3
      [  822.340033] IP: [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
      [  822.340033] PGD 1db38067 PUD 1de9e067 PMD 0
      [  822.340033] Oops: 0000 [#1] SMP
      [  822.340033] Modules linked in:
      [  822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54
      [  822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
      04/01/2014
      [  822.340033] task: ffff88001db82580 ti: ffff88001dad4000 task.ti:
      ffff88001dad4000
      [  822.340033] RIP: 0010:[<ffffffff8148781e>]  [<ffffffff8148781e>]
      mpls_nh_assign_dev+0x10b/0x182
      [  822.340033] RSP: 0018:ffff88001dad7a88  EFLAGS: 00010282
      [  822.340033] RAX: ffffffffffffff9b RBX: ffffffffffffff9b RCX:
      0000000000000002
      [  822.340033] RDX: 00000000ffffff9b RSI: 0000000000000008 RDI:
      0000000000000000
      [  822.340033] RBP: ffff88001ddc9ea0 R08: ffff88001e9f1768 R09:
      0000000000000000
      [  822.340033] R10: ffff88001d9c1100 R11: ffff88001e3c89f0 R12:
      ffffffff8187e0c0
      [  822.340033] R13: ffffffff8187e0c0 R14: ffff88001ddc9e80 R15:
      0000000000000004
      [  822.340033] FS:  00007ff9ed798700(0000) GS:ffff88001fc00000(0000)
      knlGS:0000000000000000
      [  822.340033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  822.340033] CR2: 00000000000003a3 CR3: 000000001de89000 CR4:
      00000000000006f0
      [  822.340033] Stack:
      [  822.340033]  0000000000000000 0000000100000000 0000000000000000
      0000000000000000
      [  822.340033]  0000000000000000 0801010a00000000 0000000000000000
      0000000000000000
      [  822.340033]  0000000000000004 ffffffff8148749b ffffffff8187e0c0
      000000000000001c
      [  822.340033] Call Trace:
      [  822.340033]  [<ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e
      [  822.340033]  [<ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2
      [  822.340033]  [<ffffffff810e7bbc>] ? get_page+0x5/0xa
      [  822.340033]  [<ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191
      [  822.340033]  [<ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e
      [  822.340033]  [<ffffffff813c9393>] ?
      rht_key_hashfn.isra.20.constprop.57+0x14/0x1f
      [  822.340033]  [<ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc
      [  822.340033]  [<ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82
      [  822.340033]  [<ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28
      [  822.340033]  [<ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189
      [  822.340033]  [<ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8
      [  822.340033]  [<ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b
      [  822.340033]  [<ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3
      [  822.340033]  [<ffffffff810e4f35>] ?
      __alloc_pages_nodemask+0x11c/0x1e4
      [  822.340033]  [<ffffffff8110619c>] ? PageAnon+0x5/0xd
      [  822.340033]  [<ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52
      [  822.340033]  [<ffffffff810e7bbc>] ? get_page+0x5/0xa
      [  822.340033]  [<ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a
      [  822.340033]  [<ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30
      [  822.340033]  [<ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a
      [  822.340033]  [<ffffffff8148f597>] ?
      entry_SYSCALL_64_fastpath+0x12/0x6a
      [  822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff
      eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db
      74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07
      [  822.340033] RIP  [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
      [  822.340033]  RSP <ffff88001dad7a88>
      [  822.340033] CR2: 00000000000003a3
      [  822.435363] ---[ end trace 98cc65e6f6b8bf11 ]---
      
      After patch:
      $ip -f mpls route add 100 as 200 via inet 10.1.1.8
      RTNETLINK answers: Network is unreachable
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94a57f1f
  3. 12 12月, 2015 4 次提交
    • R
      mpls: make via address optional for multipath routes · f20367df
      Robert Shearman 提交于
      The via address is optional for a single path route, yet is mandatory
      when the multipath attribute is used:
      
        # ip -f mpls route add 100 dev lo
        # ip -f mpls route add 101 nexthop dev lo
        RTNETLINK answers: Invalid argument
      
      Make them consistent by making the via address optional when the
      RTA_MULTIPATH attribute is being parsed so that both forms of
      specifying the route work.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f20367df
    • R
      mpls: fix out-of-bounds access when via address not specified · eb7809f0
      Robert Shearman 提交于
      When a via address isn't specified, the via table is left initialised
      to 0 (NEIGH_ARP_TABLE), and the via address length also left
      initialised to 0. This results in a via address array of length 0
      being allocated (contiguous with route and nexthop array), meaning
      that when a packet is sent using neigh_xmit the neighbour lookup and
      creation will cause an out-of-bounds access when accessing the 4 bytes
      of the IPv4 address it assumes it has been given a pointer to.
      
      This could be fixed by allocating the 4 bytes of via address necessary
      and leaving it as all zeroes. However, it seems wrong to me to use an
      ipv4 nexthop (including possibly ARPing for 0.0.0.0) when the user
      didn't specify to do so.
      
      Instead, set the via address table to NEIGH_NR_TABLES to signify it
      hasn't been specified and use this at forwarding time to signify a
      neigh_xmit using an L2 address consisting of the device address. This
      mechanism is the same as that used for both ARP and ND for loopback
      interfaces and those flagged as no-arp, which are all we can really
      support in this case.
      
      Fixes: cf4b24f0 ("mpls: reduce memory usage of routes")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb7809f0
    • R
      mpls: don't dump RTA_VIA attribute if not specified · 72dcac96
      Robert Shearman 提交于
      The problem seen is that when adding a route with a nexthop with no
      via address specified, iproute2 generates bogus output:
      
        # ip -f mpls route add 100 dev lo
        # ip -f mpls route list
        100 via inet 0.0.8.0 dev lo
      
      The reason for this is that the kernel generates an RTA_VIA attribute
      with the family set to AF_INET, but the via address data having zero
      length. The cause of family being AF_INET is that on route insert
      cfg->rc_via_table is left set to 0, which just happens to be
      NEIGH_ARP_TABLE which is then translated into AF_INET.
      
      iproute2 doesn't validate the length prior to printing and so prints
      garbage. Although it could be fixed to do the validation, I would
      argue that AF_INET addresses should always be exactly 4 bytes so the
      kernel is really giving userspace bogus data.
      
      Therefore, avoid generating the RTA_VIA attribute when dumping the
      route if the via address wasn't specified on add/modify. This is
      indicated by NEIGH_ARP_TABLE and a zero via address length - if the
      user specified a via address the address length would have been
      validated such that it was 4 bytes. Although this is a change in
      behaviour that is visible to userspace, I believe that what was
      generated before was invalid and as such userspace wouldn't be
      expecting it.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72dcac96
    • R
      mpls: validate L2 via address length · a3e948e8
      Robert Shearman 提交于
      If an L2 via address for an mpls nexthop is specified, the length of
      the L2 address must match that expected by the output device,
      otherwise it could access memory beyond the end of the via address
      buffer in the route.
      
      This check was present prior to commit f8efb73c ("mpls: multipath
      route support"), but got lost in the refactoring, so add it back,
      applying it to all nexthops in multipath routes.
      
      Fixes: f8efb73c ("mpls: multipath route support")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e948e8
  4. 04 12月, 2015 1 次提交
    • R
      mpls: support for dead routes · c89359a4
      Roopa Prabhu 提交于
      Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
      routes due to link events. Also adds code to ignore dead
      routes during route selection.
      
      Unlike ip routes, mpls routes are not deleted when the route goes
      dead. This is current mpls behaviour and this patch does not change
      that. With this patch however, routes will be marked dead.
      dead routes are not notified to userspace (this is consistent with ipv4
      routes).
      
      dead routes:
      -----------
      $ip -f mpls route show
      100
          nexthop as to 200 via inet 10.1.1.2  dev swp1
          nexthop as to 700 via inet 10.1.1.6  dev swp2
      
      $ip link set dev swp1 down
      
      $ip link show dev swp1
      4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
      DEFAULT group default qlen 1000
          link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
      
      $ip -f mpls route show
      100
          nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
          nexthop as to 700 via inet 10.1.1.6  dev swp2
      
      linkdown routes:
      ----------------
      $ip -f mpls route show
      100
          nexthop as to 200 via inet 10.1.1.2  dev swp1
          nexthop as to 700 via inet 10.1.1.6  dev swp2
      
      $ip link show dev swp1
      4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
      state UP mode DEFAULT group default qlen 1000
          link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
      
      /* carrier goes down */
      $ip link show dev swp1
      4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
      state DOWN mode DEFAULT group default qlen 1000
          link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
      
      $ip -f mpls route show
      100
          nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
          nexthop as to 700 via inet 10.1.1.6  dev swp2
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c89359a4
  5. 28 10月, 2015 2 次提交
    • R
      mpls: reduce memory usage of routes · cf4b24f0
      Robert Shearman 提交于
      Nexthops for MPLS routes have a via address field sized for the
      largest via address that is expected, which is 32 bytes. This means
      that in the most common case of having ipv4 via addresses, 28 bytes of
      memory more than required are used per nexthop. In the other common
      case of an ipv6 nexthop then 16 bytes more than required are
      used. With large numbers of MPLS routes this extra memory usage could
      start to become significant.
      
      To avoid allocating memory for a maximum length via address when not
      all of it is required and to allow for ease of iterating over
      nexthops, then the via addresses are changed to be stored in the same
      memory block as the route and nexthops, but in an array after the end
      of the array of nexthops. New accessors are provided to retrieve a
      pointer to the via address.
      
      To allow for O(1) access without having to store a pointer or offset
      per nh, the via address for each nexthop is sized according to the
      maximum via address for any nexthop in the route, which is stored in a
      new route field, rt_max_alen, but this is in an existing hole in
      struct mpls_route so it doesn't increase the size of the
      structure. Each via address is ensured to be aligned to VIA_ALEN_ALIGN
      to account for architectures that don't allow unaligned accesses.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf4b24f0
    • R
      mpls: fix forwarding using v4/v6 explicit null · b4e04fc7
      Robert Shearman 提交于
      Fill in the via address length for the predefined IPv4 and IPv6
      explicit-null label routes.
      
      Fixes: f8efb73c ("mpls: multipath route support")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e04fc7
  6. 23 10月, 2015 2 次提交
    • R
      mpls: flow-based multipath selection · 1c78efa8
      Robert Shearman 提交于
      Change the selection of a multipath route to use a flow-based
      hash. This more suitable for traffic sensitive to reordering within a
      flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
      of traffic given enough flows.
      
      Selection of the path for a multipath route is done using a hash of:
      1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
         including entropy label, whichever is first.
      2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
         payload, if present.
      
      Naturally, a 5-tuple hash using L4 information in addition would be
      possible and be better in some scenarios, but there is a tradeoff
      between looking deeper into the packet to achieve good distribution,
      and packet forwarding performance, and I have erred on the side of the
      latter as the default.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c78efa8
    • R
      mpls: multipath route support · f8efb73c
      Roopa Prabhu 提交于
      This patch adds support for MPLS multipath routes.
      
      Includes following changes to support multipath:
      - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
      
      - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
      
      - moves mpls route and nexthop structures into internal.h
      
      - A mpls_route can point to multiple mpls_nh structs
      
      - the nexthops are maintained as a array (similar to ipv4 fib)
      
      - In the process of restructuring, this patch also consistently changes
        all labels to u8
      
      - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
      multipath routes similar to ipv4/v6 fib
      
      - In this patch, the multipath route nexthop selection algorithm
      simply returns the first nexthop. It is replaced by a
      hash based algorithm from Robert Shearman in the next patch
      
      - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
      mpls_route_update though implemented to update based on dev, it was
      never used that way. And the dev handling gets tricky with multiple
      nexthops. Cannot match against any single nexthops dev. So, this patch
      removes the unused 'dev' handling in mpls_route_update.
      
      - dead route/path handling will be implemented in a subsequent patch
      
      Example:
      
      $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
                      nexthop as 700 via inet 10.1.1.6 dev swp2 \
                      nexthop as 800 via inet 40.1.1.2 dev swp3
      
      $ip  -f mpls route show
      100
              nexthop as to 200 via inet 10.1.1.2  dev swp1
              nexthop as to 700 via inet 10.1.1.6  dev swp2
              nexthop as to 800 via inet 40.1.1.2  dev swp3
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8efb73c
  7. 01 9月, 2015 1 次提交
  8. 10 8月, 2015 1 次提交
    • R
      mpls: Enforce payload type of traffic sent using explicit NULL · 118d5234
      Robert Shearman 提交于
      RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
      label on the stack, then after popping the resulting packet must be
      treated as a IPv4 packet and forwarded based on the IPv4 header. The
      same is true for IPv6 Explicit NULL with an IPv6 packet following.
      
      Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
      add an attribute that specifies the expected payload type for use at
      forwarding time for determining the type of the encapsulated packet
      instead of inspecting the first nibble of the packet.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      118d5234
  9. 07 8月, 2015 2 次提交
  10. 04 8月, 2015 1 次提交
  11. 01 8月, 2015 1 次提交
  12. 22 7月, 2015 2 次提交
  13. 12 6月, 2015 1 次提交
    • R
      mpls: handle device renames for per-device sysctls · 0fae3bf0
      Robert Shearman 提交于
      If a device is renamed and the original name is subsequently reused
      for a new device, the following warning is generated:
      
      sysctl duplicate entry: /net/mpls/conf/veth0//input
      CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
       0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
       ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
       ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
      Call Trace:
       [<ffffffff81566aaf>] ? dump_stack+0x40/0x50
       [<ffffffff81236279>] ? __register_sysctl_table+0x289/0x5a0
       [<ffffffffa051a24f>] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
       [<ffffffff8108db7f>] ? notifier_call_chain+0x4f/0x70
       [<ffffffff81470e72>] ? register_netdevice+0x2b2/0x480
       [<ffffffffa0524748>] ? veth_newlink+0x178/0x2d3 [veth]
       [<ffffffff8147f84c>] ? rtnl_newlink+0x73c/0x8e0
       [<ffffffff8147f27a>] ? rtnl_newlink+0x16a/0x8e0
       [<ffffffff81459ff2>] ? __kmalloc_reserve.isra.30+0x32/0x90
       [<ffffffff8147ccfd>] ? rtnetlink_rcv_msg+0x8d/0x250
       [<ffffffff8145b027>] ? __alloc_skb+0x47/0x1f0
       [<ffffffff8149badb>] ? __netlink_lookup+0xab/0xe0
       [<ffffffff8147cc70>] ? rtnetlink_rcv+0x30/0x30
       [<ffffffff8149e7a0>] ? netlink_rcv_skb+0xb0/0xd0
       [<ffffffff8147cc64>] ? rtnetlink_rcv+0x24/0x30
       [<ffffffff8149df17>] ? netlink_unicast+0x107/0x1a0
       [<ffffffff8149e4be>] ? netlink_sendmsg+0x50e/0x630
       [<ffffffff8145209c>] ? sock_sendmsg+0x3c/0x50
       [<ffffffff81452beb>] ? ___sys_sendmsg+0x27b/0x290
       [<ffffffff811bd258>] ? mem_cgroup_try_charge+0x88/0x110
       [<ffffffff811bd5b6>] ? mem_cgroup_commit_charge+0x56/0xa0
       [<ffffffff811d7700>] ? do_filp_open+0x30/0xa0
       [<ffffffff8145336e>] ? __sys_sendmsg+0x3e/0x80
       [<ffffffff8156c3f2>] ? system_call_fastpath+0x16/0x75
      
      Fix this by unregistering the previous sysctl table (registered for
      the path containing the original device name) and re-registering the
      table for the path containing the new device name.
      
      Fixes: 37bde799 ("mpls: Per-device enabling of packet input")
      Reported-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fae3bf0
  14. 08 6月, 2015 1 次提交
  15. 10 5月, 2015 1 次提交
  16. 06 5月, 2015 1 次提交
  17. 23 4月, 2015 3 次提交
  18. 13 3月, 2015 1 次提交
  19. 09 3月, 2015 5 次提交
  20. 07 3月, 2015 1 次提交
  21. 06 3月, 2015 1 次提交
  22. 05 3月, 2015 1 次提交
  23. 04 3月, 2015 5 次提交
    • E
      mpls: Multicast route table change notifications · 8de147dc
      Eric W. Biederman 提交于
      Unlike IPv4 this code notifies on all cases where mpls routes
      are added or removed and it never automatically removes routes.
      Avoiding both the userspace confusion that is caused by omitting
      route updates and the possibility of a flood of netlink traffic
      when an interface goes doew.
      
      For now reserved labels are handled automatically and userspace
      is not notified.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8de147dc
    • E
      mpls: Netlink commands to add, remove, and dump routes · 03c05665
      Eric W. Biederman 提交于
      This change adds two new netlink routing attributes:
      RTA_VIA and RTA_NEWDST.
      
      RTA_VIA specifies the specifies the next machine to send a packet to
      like RTA_GATEWAY.  RTA_VIA differs from RTA_GATEWAY in that it
      includes the address family of the address of the next machine to send
      a packet to.  Currently the MPLS code supports addresses in AF_INET,
      AF_INET6 and AF_PACKET.  For AF_INET and AF_INET6 the destination mac
      address is acquired from the neighbour table.  For AF_PACKET the
      destination mac_address is specified in the netlink configuration.
      
      I think raw destination mac address support with the family AF_PACKET
      will prove useful.  There is MPLS-TP which is defined to operate
      on machines that do not support internet packets of any flavor.  Further
      seem to be corner cases where it can be useful.  At this point
      I don't care much either way.
      
      RTA_NEWDST specifies the destination address to forward the packet
      with.  MPLS typically changes it's destination address at every hop.
      For a swap operation RTA_NEWDST is specified with a length of one label.
      For a push operation RTA_NEWDST is specified with two or more labels.
      For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
      RTAN_NEWDST is specified.
      
      Those new netlink attributes are used to implement handling of rt-netlink
      RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
      MPLS label table.
      
      rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
      verify no unhandled attributes or unhandled values are present and sets
      up the data structures for mpls_route_add and mpls_route_del.
      
      I did my best to match up with the existing conventions with the caveats
      that MPLS addresses are all destination-specific-addresses, and so
      don't properly have a scope.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03c05665
    • E
      mpls: Functions for reading and wrinting mpls labels over netlink · 966bae33
      Eric W. Biederman 提交于
      Reading and writing addresses in network byte order in netlink is
      traditional and I see no reason to change that.  MPLS is interesting
      as effectively it has variabely length addresses (the MPLS label
      stack).  To represent these variable length addresses in netlink
      I use a valid MPLS label stack (complete with stop bit).
      
      This achieves two things: a well defined existing format is used,
      and the data can be interpreted without looking at it's length.
      
      Not needed to look at the length to decode the variable length
      network representation allows existing userspace functions
      such as inet_ntop to be used without needed to change their
      prototype.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      966bae33
    • E
      mpls: Basic support for adding and removing routes · a2519929
      Eric W. Biederman 提交于
      mpls_route_add and mpls_route_del implement the basic logic for adding
      and removing Next Hop Label Forwarding Entries from the MPLS input
      label map.  The addition and subtraction is done in a way that is
      consistent with how the existing routing table in Linux are
      maintained.  Thus all of the work to deal with NLM_F_APPEND,
      NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.
      
      Cases that are not clearly defined such as changing the interpretation
      of the mpls reserved labels is not allowed.
      
      Because it seems like the right thing to do adding an MPLS route without
      specifying an input label and allowing the kernel to pick a free label
      table entry is supported.   The implementation is currently less than optimal
      but that can be changed.
      
      As I don't have anything else to test with only ethernet and the loopback
      device are the only two device types currently supported for forwarding
      MPLS over.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2519929
    • E
      mpls: Add a sysctl to control the size of the mpls label table · 7720c01f
      Eric W. Biederman 提交于
      This sysctl gives two benefits.  By defaulting the table size to 0
      mpls even when compiled in and enabled defaults to not forwarding
      any packets.  This prevents unpleasant surprises for users.
      
      The other benefit is that as mpls labels are allocated locally a dense
      table a small dense label table may be used which saves memory and
      is extremely simple and efficient to implement.
      
      This sysctl allows userspace to choose the restrictions on the label
      table size userspace applications need to cope with.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7720c01f