1. 15 1月, 2020 2 次提交
    • N
      net: bridge: vlan: add rtm definitions and dump support · 8dcea187
      Nikolay Aleksandrov 提交于
      This patch adds vlan rtm definitions:
       - NEWVLAN: to be used for creating vlans, setting options and
         notifications
       - DELVLAN: to be used for deleting vlans
       - GETVLAN: used for dumping vlan information
      
      Dumping vlans which can span multiple messages is added now with basic
      information (vid and flags). We use nlmsg_parse() to validate the header
      length in order to be able to extend the message with filtering
      attributes later.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dcea187
    • I
      ipv4: Add "offload" and "trap" indications to routes · 90b93f1b
      Ido Schimmel 提交于
      When performing L3 offload, routes and nexthops are usually programmed
      into two different tables in the underlying device. Therefore, the fact
      that a nexthop resides in hardware does not necessarily mean that all
      the associated routes also reside in hardware and vice-versa.
      
      While the kernel can signal to user space the presence of a nexthop in
      hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
      for routes. In addition, the fact that a route resides in hardware does
      not necessarily mean that the traffic is offloaded. For example,
      unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
      packets to the CPU so that the kernel will be able to generate the
      appropriate ICMP error packet.
      
      This patch adds an "offload" and "trap" indications to IPv4 routes, so
      that users will have better visibility into the offload process.
      
      'struct fib_alias' is extended with two new fields that indicate if the
      route resides in hardware or not and if it is offloading traffic from
      the kernel or trapping packets to it. Note that the new fields are added
      in the 6 bytes hole and therefore the struct still fits in a single
      cache line [1].
      
      Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
      route's key in order to set the flags.
      
      The indications are dumped to user space via a new flags (i.e.,
      'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
      ancillary header.
      
      v2:
      * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
      
      [1]
      struct fib_alias {
              struct hlist_node  fa_list;                      /*     0    16 */
              struct fib_info *          fa_info;              /*    16     8 */
              u8                         fa_tos;               /*    24     1 */
              u8                         fa_type;              /*    25     1 */
              u8                         fa_state;             /*    26     1 */
              u8                         fa_slen;              /*    27     1 */
              u32                        tb_id;                /*    28     4 */
              s16                        fa_default;           /*    32     2 */
              u8                         offload:1;            /*    34: 0  1 */
              u8                         trap:1;               /*    34: 1  1 */
              u8                         unused:6;             /*    34: 2  1 */
      
              /* XXX 5 bytes hole, try to pack */
      
              struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */
      
              /* size: 56, cachelines: 1, members: 12 */
              /* sum members: 50, holes: 1, sum holes: 5 */
              /* sum bitfield members: 8 bits (1 bytes) */
              /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
              /* last cacheline: 56 bytes */
      } __attribute__((__aligned__(8)));
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90b93f1b
  2. 02 10月, 2019 1 次提交
  3. 29 5月, 2019 1 次提交
    • D
      net: nexthop uapi · 65ee00a9
      David Ahern 提交于
      New UAPI for nexthops as standalone objects:
      - defines netlink ancillary header, struct nhmsg
      - RTM commands for nexthop objects, RTM_*NEXTHOP,
      - RTNLGRP for nexthop notifications, RTNLGRP_NEXTHOP,
      - Attributes for creating nexthops, NHA_*
      - Attribute for route specs to specify a nexthop by id, RTA_NH_ID.
      
      The nexthop attributes and semantics follow the route and RTA ones for
      device, gateway and lwt encap. Unique to nexthop objects are a blackhole
      and a group which contains references to other nexthop objects. With the
      exception of blackhole and group, nexthop objects MUST contain a device.
      Gateway and encap are optional. Nexthop groups can only reference other
      pre-existing nexthops by id. If the NHA_ID attribute is present that id
      is used for the nexthop. If not specified, one is auto assigned.
      
      Dump requests can include attributes:
      - NHA_GROUPS to return only nexthop groups,
      - NHA_MASTER to limit dumps to nexthops with devices enslaved to the
        given master (e.g., VRF)
      - NHA_OIF to limit dumps to nexthops using given device
      
      nlmsg_route_perms in selinux code is updated for the new RTM comands.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65ee00a9
  4. 24 7月, 2018 1 次提交
  5. 01 6月, 2018 1 次提交
  6. 24 5月, 2018 1 次提交
  7. 18 1月, 2018 2 次提交
  8. 16 12月, 2017 1 次提交
  9. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  10. 24 10月, 2017 1 次提交
    • C
      tcp: Configure TFO without cookie per socket and/or per route · 71c02379
      Christoph Paasch 提交于
      We already allow to enable TFO without a cookie by using the
      fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
      TFO_CLIENT_NO_COOKIE).
      This is safe to do in certain environments where we know that there
      isn't a malicous host (aka., data-centers) or when the
      application-protocol already provides an authentication mechanism in the
      first flight of data.
      
      A server however might be providing multiple services or talking to both
      sides (public Internet and data-center). So, this server would want to
      enable cookie-less TFO for certain services and/or for connections that
      go to the data-center.
      
      This patch exposes a socket-option and a per-route attribute to enable such
      fine-grained configurations.
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71c02379
  11. 31 7月, 2017 2 次提交
    • J
      net sched actions: add time filter for action dumping · e62e484d
      Jamal Hadi Salim 提交于
      This patch adds support for filtering based on time since last used.
      When we are dumping a large number of actions it is useful to
      have the option of filtering based on when the action was last
      used to reduce the amount of data crossing to user space.
      
      With this patch the user space app sets the TCA_ROOT_TIME_DELTA
      attribute with the value in milliseconds with "time of interest
      since now".  The kernel converts this to jiffies and does the
      filtering comparison matching entries that have seen activity
      since then and returns them to user space.
      Old kernels and old tc continue to work in legacy mode since
      they dont specify this attribute.
      
      Some example (we have 400 actions bound to 400 filters); at
      installation time. Using updated when tc setting the time of
      interest to 120 seconds earlier (we see 400 actions):
      prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
      400
      
      go get some coffee and wait for > 120 seconds and try again:
      
      prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
      0
      
      Lets see a filter bound to one of these actions:
      ....
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
        match 7f000002/ffffffff at 12 (success 1 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1145 sec used 802 sec
          Action statistics:
          Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      ....
      
      that coffee took long, no? It was good.
      
      Now lets ping -c 1 127.0.0.2, then run the actions again:
      prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
      1
      
      More details please:
      prompt$ hackedtc -s actions ls action gact since 120000
      
          action order 0: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1270 sec used 30 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      
      And the filter?
      
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
        match 7f000002/ffffffff at 12 (success 2 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1324 sec used 84 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62e484d
    • J
      net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch · 90825b23
      Jamal Hadi Salim 提交于
      When you dump hundreds of thousands of actions, getting only 32 per
      dump batch even when the socket buffer and memory allocations allow
      is inefficient.
      
      With this change, the user will get as many as possibly fitting
      within the given constraints available to the kernel.
      
      The top level action TLV space is extended. An attribute
      TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
      is set by the user indicating the user is capable of processing
      these large dumps. Older user space which doesnt set this flag
      doesnt get the large (than 32) batches.
      The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
      actions are put in a single batch. As such user space app knows how long
      to iterate (independent of the type of action being dumped)
      instead of hardcoded maximum of 32 thus maintaining backward compat.
      
      Some results dumping 1.5M actions below:
      first an unpatched tc which doesnt understand these features...
      
      prompt$ time -p tc actions ls action gact | grep index | wc -l
      1500000
      real 1388.43
      user 2.07
      sys 1386.79
      
      Now lets see a patched tc which sets the correct flags when requesting
      a dump:
      
      prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
      1500000
      real 178.13
      user 2.02
      sys 176.96
      
      That is about 8x performance improvement for tc app which sets its
      receive buffer to about 32K.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90825b23
  12. 21 6月, 2017 2 次提交
  13. 27 5月, 2017 1 次提交
  14. 18 5月, 2017 1 次提交
  15. 29 3月, 2017 1 次提交
  16. 14 3月, 2017 1 次提交
    • R
      mpls: allow TTL propagation to IP packets to be configured · 5b441ac8
      Robert Shearman 提交于
      Provide the ability to control on a per-route basis whether the TTL
      value from an MPLS packet is propagated to an IPv4/IPv6 packet when
      the last label is popped as per the theoretical model in RFC 3443
      through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
      mean disable propagation and 1 to mean enable propagation.
      
      In order to provide the ability to change the behaviour for packets
      arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
      way for a user to change the behaviour for all existing routes without
      having to reprogram them, a global knob is provided. This is done
      through the addition of a new per-namespace sysctl,
      "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
      per-route attribute is set (either enabled or disabled) then it
      overrides the global configuration.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b441ac8
  17. 13 3月, 2017 1 次提交
  18. 21 2月, 2017 1 次提交
  19. 03 1月, 2017 1 次提交
  20. 05 11月, 2016 1 次提交
  21. 19 10月, 2016 1 次提交
  22. 27 4月, 2016 1 次提交
  23. 22 4月, 2016 1 次提交
  24. 21 4月, 2016 1 次提交
    • R
      rtnetlink: add new RTM_GETSTATS message to dump link stats · 10c9ead9
      Roopa Prabhu 提交于
      This patch adds a new RTM_GETSTATS message to query link stats via netlink
      from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
      returns a lot more than just stats and is expensive in some cases when
      frequent polling for stats from userspace is a common operation.
      
      RTM_GETSTATS is an attempt to provide a light weight netlink message
      to explicity query only link stats from the kernel on an interface.
      The idea is to also keep it extensible so that new kinds of stats can be
      added to it in the future.
      
      This patch adds the following attribute for NETDEV stats:
      struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
              [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
      };
      
      Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
      a single interface or all interfaces with NLM_F_DUMP.
      
      Future possible new types of stat attributes:
      link af stats:
          - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
          - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
      extended stats:
          - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
            vlan, vxlan etc)
          - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
            available via ethtool today)
      
      This patch also declares a filter mask for all stat attributes.
      User has to provide a mask of stats attributes to query. filter mask
      can be specified in the new hdr 'struct if_stats_msg' for stats messages.
      Other important field in the header is the ifindex.
      
      This api can also include attributes for global stats (eg tcp) in the future.
      When global stats are included in a stats msg, the ifindex in the header
      must be zero. A single stats message cannot contain both global and
      netdev specific stats. To easily distinguish them, netdev specific stat
      attributes name are prefixed with IFLA_STATS_LINK_
      
      Without any attributes in the filter_mask, no stats will be returned.
      
      This patch has been tested with mofified iproute2 ifstat.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c9ead9
  25. 18 12月, 2015 1 次提交
  26. 13 10月, 2015 1 次提交
  27. 16 9月, 2015 2 次提交
  28. 01 9月, 2015 1 次提交
  29. 18 8月, 2015 1 次提交
    • J
      lwtunnel: rename ip lwtunnel attributes · a1c234f9
      Jiri Benc 提交于
      We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
      very similar, yet they serve very different purpose. This is confusing for
      anyone trying to implement a user space tool supporting lwt.
      
      As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
      them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
      logical to have them in lwtunnel.h together with the encap enum.
      
      Fixes: 3093fbe7 ("route: Per route IP tunnel metadata via lightweight tunnel")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c234f9
  30. 22 7月, 2015 2 次提交
  31. 24 6月, 2015 1 次提交
    • A
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek 提交于
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a3d0316
  32. 15 5月, 2015 1 次提交
  33. 08 4月, 2015 1 次提交
  34. 12 3月, 2015 1 次提交