1. 09 10月, 2018 3 次提交
  2. 25 9月, 2018 1 次提交
    • S
      mpls: allow routes on ip6gre devices · d8e2262a
      Saif Hasan 提交于
      Summary:
      
      This appears to be necessary and sufficient change to enable `MPLS` on
      `ip6gre` tunnels (RFC4023).
      
      This diff allows IP6GRE devices to be recognized by MPLS kernel module
      and hence user can configure interface to accept packets with mpls
      headers as well setup mpls routes on them.
      
      Test Plan:
      
      Test plan consists of multiple containers connected via GRE-V6 tunnel.
      Then carrying out testing steps as below.
      
      - Carry out necessary sysctl settings on all containers
      
      ```
      sysctl -w net.mpls.platform_labels=65536
      sysctl -w net.mpls.ip_ttl_propagate=1
      sysctl -w net.mpls.conf.lo.input=1
      ```
      
      - Establish IP6GRE tunnels
      
      ```
      ip -6 tunnel add name if_1_2_1 mode ip6gre \
        local 2401:db00:21:6048:feed:0::1 \
        remote 2401:db00:21:6048:feed:0::2 key 1
      ip link set dev if_1_2_1 up
      sysctl -w net.mpls.conf.if_1_2_1.input=1
      ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link
      
      ip -6 tunnel add name if_1_3_1 mode ip6gre \
        local 2401:db00:21:6048:feed:0::1 \
        remote 2401:db00:21:6048:feed:0::3 key 1
      ip link set dev if_1_3_1 up
      sysctl -w net.mpls.conf.if_1_3_1.input=1
      ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
      ```
      
      - Install MPLS encap rules on node-1 towards node-2
      
      ```
      ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
        via inet 169.254.0.3 dev if_1_2_1
      ```
      
      - Install MPLS forwarding rules on node-2 and node-3
      ```
      // node2
      ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1
      
      // node3
      ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
      ```
      
      - Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
        towards 192.168.0.1 is via IP route directly towards node1 from node4)
      ```
      ping 192.168.0.11
      ```
      
      - tcpdump on interface to capture ping packets wrapped within MPLS
        header which inturn wrapped within IP6GRE header
      
      ```
      16:43:41.121073 IP6
        2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
        DSTOPT GREv0, key=0x1, length 100:
        MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
        IP 192.168.0.1 > 192.168.0.11:
        ICMP echo request, id 1208, seq 45, length 64
      
      0x0000:  6000 2cdb 006c 3c3f 2401 db00 0021 6048  `.,..l<?$....!`H
      0x0010:  feed 0000 0000 0001 2401 db00 0021 6048  ........$....!`H
      0x0020:  feed 0000 0000 0002 2f00 0401 0401 0100  ......../.......
      0x0030:  2000 8847 0000 0001 0002 00ff 0004 01ff  ...G............
      0x0040:  4500 0054 3280 4000 ff01 c7cb c0a8 0001  E..T2.@.........
      0x0050:  c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b  ...........--<.[
      0x0060:  0000 0000 bcd8 0100 0000 0000 1011 1213  ................
      0x0070:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
      0x0080:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
      0x0090:  3435 3637                                4567
      ```
      Signed-off-by: NSaif Hasan <has@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8e2262a
  3. 28 3月, 2018 1 次提交
  4. 18 3月, 2018 1 次提交
  5. 05 3月, 2018 1 次提交
  6. 09 2月, 2018 1 次提交
    • D
      mpls, nospec: Sanitize array index in mpls_label_ok() · 3968523f
      Dan Williams 提交于
      mpls_label_ok() validates that the 'platform_label' array index from a
      userspace netlink message payload is valid. Under speculation the
      mpls_label_ok() result may not resolve in the CPU pipeline until after
      the index is used to access an array element. Sanitize the index to zero
      to prevent userspace-controlled arbitrary out-of-bounds speculation, a
      precursor for a speculative execution side channel vulnerability.
      
      Cc: <stable@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3968523f
  7. 05 12月, 2017 1 次提交
  8. 12 10月, 2017 1 次提交
  9. 08 10月, 2017 1 次提交
  10. 10 8月, 2017 1 次提交
  11. 08 7月, 2017 1 次提交
  12. 05 7月, 2017 1 次提交
  13. 04 7月, 2017 1 次提交
    • R
      mpls: route get support · 397fc9e5
      Roopa Prabhu 提交于
      This patch adds RTM_GETROUTE doit handler for mpls routes.
      
      Input:
      RTA_DST - input label
      RTA_NEWDST - labels in packet for multipath selection
      
      By default the getroute handler returns matched
      nexthop label, via and oif
      
      With RTM_F_FIB_MATCH flag, full matched route is
      returned.
      
      example (with patched iproute2):
      $ip -f mpls route show
      101
              nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
              nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
      201
              nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
              nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12
      
      $ip -f mpls route get 103
      RTNETLINK answers: Network is unreachable
      
      $ip -f mpls route get 101
      101 as to 102/103 via inet 172.16.2.2 dev virt1-2
      
      $ip -f mpls route get as to 302/303 101
      101 as to 302/303 via inet 172.16.12.2 dev virt1-12
      
      $ip -f mpls route get fibmatch 103
      RTNETLINK answers: Network is unreachable
      
      $ip -f mpls route get fibmatch 101
      101
              nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
              nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      397fc9e5
  14. 01 6月, 2017 1 次提交
  15. 30 5月, 2017 5 次提交
  16. 09 5月, 2017 1 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
  17. 18 4月, 2017 1 次提交
  18. 14 4月, 2017 1 次提交
  19. 02 4月, 2017 6 次提交
    • D
      net: mpls: Increase max number of labels for lwt encap · 1511009c
      David Ahern 提交于
      Alow users to push down more labels per MPLS encap. Similar to LSR case,
      move label array to the end of mpls_iptunnel_encap and allocate based on
      the number of labels for the route.
      
      For consistency with the LSR case, re-use the same maximum number of
      labels.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1511009c
    • D
      net: mpls: bump maximum number of labels · a4ac8c98
      David Ahern 提交于
      Allow users to push down more labels per MPLS route. With the previous
      patches, no memory allocations are based on MAX_NEW_LABELS; the limit
      is only used to keep userspace in check.
      
      At this point MAX_NEW_LABELS is only used for mpls_route_config (copying
      route data from userspace) and processing nexthops looking for the max
      number of labels across the route spec.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4ac8c98
    • D
      net: mpls: Limit memory allocation for mpls_route · df1c6316
      David Ahern 提交于
      Limit memory allocation size for mpls_route to 4096.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df1c6316
    • D
      net: mpls: change mpls_route layout · 59b20966
      David Ahern 提交于
      Move labels to the end of mpls_nh as a 0-sized array and within mpls_route
      move the via for a nexthop after the mpls_nh. The new layout becomes:
      
         +----------------------+
         | mpls_route           |
         +----------------------+
         | mpls_nh 0            |
         +----------------------+
         | alignment padding    |   4 bytes for odd number of labels; 0 for even
         +----------------------+
         | via[rt_max_alen] 0   |
         +----------------------+
         | alignment padding    |   via's aligned on sizeof(unsigned long)
         +----------------------+
         | ...                  |
         +----------------------+
         | mpls_nh n-1          |
         +----------------------+
         | via[rt_max_alen] n-1 |
         +----------------------+
      
      Memory allocated for nexthop + via is constant across all nexthops and
      their via. It is based on the maximum number of labels across all nexthops
      and the maximum via length. The size is saved in the mpls_route as
      rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size.
      
      The offset of the via address from a nexthop is saved as rt_via_offset
      so that given an mpls_nh pointer the via for that hop is simply
      nh + rt->rt_via_offset.
      
      With prior code, memory allocated per mpls_route with 1 nexthop:
           via is an ethernet address - 64 bytes
           via is an ipv4 address     - 64
           via is an ipv6 address     - 72
      
      With this patch set, memory allocated per mpls_route with 1 nexthop and
      1 or 2 labels:
           via is an ethernet address - 56 bytes
           via is an ipv4 address     - 56
           via is an ipv6 address     - 64
      
      The 8-byte reduction is due to the previous patch; the change introduced
      by this patch has no impact on the size of allocations for 1 or 2 labels.
      
      Performance impact of this change was examined using network namespaces
      with veth pairs connecting namespaces. ns0 inserts the packet to the
      label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2
      labels depending on test, ns2 (and ns3 for 2-label test) pops the label
      and forwards. ns3 (or ns4) for a 2-label is the destination. Similar
      series of namespaces used for 2-nexthop test.
      
      Intent is to measure changes to latency (overhead in manipulating the
      packet) in the forwarding path. Tests used netperf with UDP_RR.
      
      IPv4:                     current   patches
         1 label, 1 nexthop      29908     30115
         2 label, 1 nexthop      29071     29612
         1 label, 2 nexthop      29582     29776
         2 label, 2 nexthop      29086     29149
      
      IPv6:                     current   patches
         1 label, 1 nexthop      24502     24960
         2 label, 1 nexthop      24041     24407
         1 label, 2 nexthop      23795     23899
         2 label, 2 nexthop      23074     22959
      
      In short, the change has no effect to a modest increase in performance.
      This is expected since this patch does not really have an impact on routes
      with 1 or 2 labels (the current limit) and 1 or 2 nexthops.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59b20966
    • D
      net: mpls: Convert number of nexthops to u8 · 77ef013a
      David Ahern 提交于
      Number of nexthops and number of alive nexthops are tracked using an
      unsigned int. A route should never have more than 255 nexthops so
      convert both to u8. Update all references and intermediate variables
      to consistently use u8 as well.
      
      Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte
      hole before the nexthops.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77ef013a
    • D
      net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE · 39eb8cd1
      David Ahern 提交于
      The number of alive nexthops for a route (rt->rt_nhn_alive) and the
      flags for a next hop (nh->nh_flags) are modified by netdev event
      handlers. The event handlers run with rtnl_lock held so updates are
      always done with the lock held. The packet path accesses the fields
      under the rcu lock. Since those fields can change at any moment in
      the packet path, both fields should be accessed using READ_ONCE. Updates
      to both fields should use WRITE_ONCE.
      
      Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup
      (event handlers) accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39eb8cd1
  20. 30 3月, 2017 1 次提交
  21. 29 3月, 2017 2 次提交
  22. 28 3月, 2017 2 次提交
  23. 25 3月, 2017 1 次提交
  24. 17 3月, 2017 1 次提交
    • D
      net: mpls: Fix nexthop alive tracking on down events · 61733c91
      David Ahern 提交于
      Alive tracking of nexthops can account for a link twice if the carrier
      goes down followed by an admin down of the same link rendering multipath
      routes useless. This is similar to 79099aab for UNREGISTER events and
      DOWN events.
      
      Fix by tracking number of alive nexthops in mpls_ifdown similar to the
      logic in mpls_ifup. Checking the flags per nexthop once after all events
      have been processed is simpler than trying to maintian a running count
      through all event combinations.
      
      Also, WRITE_ONCE is used instead of ACCESS_ONCE to set rt_nhn_alive
      per a comment from checkpatch:
          WARNING: Prefer WRITE_ONCE(<FOO>, <BAR>) over ACCESS_ONCE(<FOO>) = <BAR>
      
      Fixes: c89359a4 ("mpls: support for dead routes")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61733c91
  25. 14 3月, 2017 2 次提交
    • R
      mpls: allow TTL propagation from IP packets to be configured · a59166e4
      Robert Shearman 提交于
      Allow TTL propagation from IP packets to MPLS packets to be
      configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
      allows the TTL to be set in the resulting MPLS packet, with the value
      of 0 having the semantics of enabling propagation of the TTL from the
      IP header (i.e. non-zero values disable propagation).
      
      Also allow the configuration to be overridden globally by reusing the
      same sysctl to control whether the TTL is propagated from IP packets
      into the MPLS header. If the per-LWT attribute is set then it
      overrides the global configuration. If the TTL isn't propagated then a
      default TTL value is used which can be configured via a new sysctl,
      "net.mpls.default_ttl". This is kept separate from the configuration
      of whether IP TTL propagation is enabled as it can be used in the
      future when non-IP payloads are supported (i.e. where there is no
      payload TTL that can be propagated).
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59166e4
    • R
      mpls: allow TTL propagation to IP packets to be configured · 5b441ac8
      Robert Shearman 提交于
      Provide the ability to control on a per-route basis whether the TTL
      value from an MPLS packet is propagated to an IPv4/IPv6 packet when
      the last label is popped as per the theoretical model in RFC 3443
      through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
      mean disable propagation and 1 to mean enable propagation.
      
      In order to provide the ability to change the behaviour for packets
      arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
      way for a user to change the behaviour for all existing routes without
      having to reprogram them, a global knob is provided. This is done
      through the addition of a new per-namespace sysctl,
      "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
      per-route attribute is set (either enabled or disabled) then it
      overrides the global configuration.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b441ac8
  26. 13 3月, 2017 1 次提交
    • D
      mpls: Do not decrement alive counter for unregister events · 79099aab
      David Ahern 提交于
      Multipath routes can be rendered usesless when a device in one of the
      paths is deleted. For example:
      
      $ ip -f mpls ro ls
      100
      	nexthop as to 200 via inet 172.16.2.2  dev virt12
      	nexthop as to 300 via inet 172.16.3.2  dev br0
      101
      	nexthop as to 201 via inet6 2000:2::2  dev virt12
      	nexthop as to 301 via inet6 2000:3::2  dev br0
      
      $ ip li del br0
      
      When br0 is deleted the other hop is not considered in
      mpls_select_multipath because of the alive check -- rt_nhn_alive
      is 0.
      
      rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
      down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
      a 2 hop route, deleting one device drops the alive count to 0. Since
      devices are taken down before unregistering, the decrement on
      NETDEV_UNREGISTER is redundant.
      
      Fixes: c89359a4 ("mpls: support for dead routes")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79099aab