1. 31 7月, 2017 2 次提交
    • J
      net sched actions: add time filter for action dumping · e62e484d
      Jamal Hadi Salim 提交于
      This patch adds support for filtering based on time since last used.
      When we are dumping a large number of actions it is useful to
      have the option of filtering based on when the action was last
      used to reduce the amount of data crossing to user space.
      
      With this patch the user space app sets the TCA_ROOT_TIME_DELTA
      attribute with the value in milliseconds with "time of interest
      since now".  The kernel converts this to jiffies and does the
      filtering comparison matching entries that have seen activity
      since then and returns them to user space.
      Old kernels and old tc continue to work in legacy mode since
      they dont specify this attribute.
      
      Some example (we have 400 actions bound to 400 filters); at
      installation time. Using updated when tc setting the time of
      interest to 120 seconds earlier (we see 400 actions):
      prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
      400
      
      go get some coffee and wait for > 120 seconds and try again:
      
      prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
      0
      
      Lets see a filter bound to one of these actions:
      ....
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
        match 7f000002/ffffffff at 12 (success 1 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1145 sec used 802 sec
          Action statistics:
          Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      ....
      
      that coffee took long, no? It was good.
      
      Now lets ping -c 1 127.0.0.2, then run the actions again:
      prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
      1
      
      More details please:
      prompt$ hackedtc -s actions ls action gact since 120000
      
          action order 0: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1270 sec used 30 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      
      And the filter?
      
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
        match 7f000002/ffffffff at 12 (success 2 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1324 sec used 84 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62e484d
    • J
      net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch · 90825b23
      Jamal Hadi Salim 提交于
      When you dump hundreds of thousands of actions, getting only 32 per
      dump batch even when the socket buffer and memory allocations allow
      is inefficient.
      
      With this change, the user will get as many as possibly fitting
      within the given constraints available to the kernel.
      
      The top level action TLV space is extended. An attribute
      TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
      is set by the user indicating the user is capable of processing
      these large dumps. Older user space which doesnt set this flag
      doesnt get the large (than 32) batches.
      The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
      actions are put in a single batch. As such user space app knows how long
      to iterate (independent of the type of action being dumped)
      instead of hardcoded maximum of 32 thus maintaining backward compat.
      
      Some results dumping 1.5M actions below:
      first an unpatched tc which doesnt understand these features...
      
      prompt$ time -p tc actions ls action gact | grep index | wc -l
      1500000
      real 1388.43
      user 2.07
      sys 1386.79
      
      Now lets see a patched tc which sets the correct flags when requesting
      a dump:
      
      prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
      1500000
      real 178.13
      user 2.02
      sys 176.96
      
      That is about 8x performance improvement for tc app which sets its
      receive buffer to about 32K.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90825b23
  2. 21 6月, 2017 2 次提交
  3. 27 5月, 2017 1 次提交
  4. 18 5月, 2017 1 次提交
  5. 29 3月, 2017 1 次提交
  6. 14 3月, 2017 1 次提交
    • R
      mpls: allow TTL propagation to IP packets to be configured · 5b441ac8
      Robert Shearman 提交于
      Provide the ability to control on a per-route basis whether the TTL
      value from an MPLS packet is propagated to an IPv4/IPv6 packet when
      the last label is popped as per the theoretical model in RFC 3443
      through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
      mean disable propagation and 1 to mean enable propagation.
      
      In order to provide the ability to change the behaviour for packets
      arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
      way for a user to change the behaviour for all existing routes without
      having to reprogram them, a global knob is provided. This is done
      through the addition of a new per-namespace sysctl,
      "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
      per-route attribute is set (either enabled or disabled) then it
      overrides the global configuration.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b441ac8
  7. 13 3月, 2017 1 次提交
  8. 21 2月, 2017 1 次提交
  9. 03 1月, 2017 1 次提交
  10. 05 11月, 2016 1 次提交
  11. 19 10月, 2016 1 次提交
  12. 27 4月, 2016 1 次提交
  13. 22 4月, 2016 1 次提交
  14. 21 4月, 2016 1 次提交
    • R
      rtnetlink: add new RTM_GETSTATS message to dump link stats · 10c9ead9
      Roopa Prabhu 提交于
      This patch adds a new RTM_GETSTATS message to query link stats via netlink
      from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
      returns a lot more than just stats and is expensive in some cases when
      frequent polling for stats from userspace is a common operation.
      
      RTM_GETSTATS is an attempt to provide a light weight netlink message
      to explicity query only link stats from the kernel on an interface.
      The idea is to also keep it extensible so that new kinds of stats can be
      added to it in the future.
      
      This patch adds the following attribute for NETDEV stats:
      struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
              [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
      };
      
      Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
      a single interface or all interfaces with NLM_F_DUMP.
      
      Future possible new types of stat attributes:
      link af stats:
          - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
          - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
      extended stats:
          - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
            vlan, vxlan etc)
          - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
            available via ethtool today)
      
      This patch also declares a filter mask for all stat attributes.
      User has to provide a mask of stats attributes to query. filter mask
      can be specified in the new hdr 'struct if_stats_msg' for stats messages.
      Other important field in the header is the ifindex.
      
      This api can also include attributes for global stats (eg tcp) in the future.
      When global stats are included in a stats msg, the ifindex in the header
      must be zero. A single stats message cannot contain both global and
      netdev specific stats. To easily distinguish them, netdev specific stat
      attributes name are prefixed with IFLA_STATS_LINK_
      
      Without any attributes in the filter_mask, no stats will be returned.
      
      This patch has been tested with mofified iproute2 ifstat.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c9ead9
  15. 18 12月, 2015 1 次提交
  16. 13 10月, 2015 1 次提交
  17. 16 9月, 2015 2 次提交
  18. 01 9月, 2015 1 次提交
  19. 18 8月, 2015 1 次提交
    • J
      lwtunnel: rename ip lwtunnel attributes · a1c234f9
      Jiri Benc 提交于
      We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
      very similar, yet they serve very different purpose. This is confusing for
      anyone trying to implement a user space tool supporting lwt.
      
      As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
      them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
      logical to have them in lwtunnel.h together with the encap enum.
      
      Fixes: 3093fbe7 ("route: Per route IP tunnel metadata via lightweight tunnel")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c234f9
  20. 22 7月, 2015 2 次提交
  21. 24 6月, 2015 1 次提交
    • A
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek 提交于
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a3d0316
  22. 15 5月, 2015 1 次提交
  23. 08 4月, 2015 1 次提交
  24. 12 3月, 2015 1 次提交
  25. 06 3月, 2015 1 次提交
  26. 04 3月, 2015 2 次提交
    • E
      mpls: Multicast route table change notifications · 8de147dc
      Eric W. Biederman 提交于
      Unlike IPv4 this code notifies on all cases where mpls routes
      are added or removed and it never automatically removes routes.
      Avoiding both the userspace confusion that is caused by omitting
      route updates and the possibility of a flood of netlink traffic
      when an interface goes doew.
      
      For now reserved labels are handled automatically and userspace
      is not notified.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8de147dc
    • E
      mpls: Netlink commands to add, remove, and dump routes · 03c05665
      Eric W. Biederman 提交于
      This change adds two new netlink routing attributes:
      RTA_VIA and RTA_NEWDST.
      
      RTA_VIA specifies the specifies the next machine to send a packet to
      like RTA_GATEWAY.  RTA_VIA differs from RTA_GATEWAY in that it
      includes the address family of the address of the next machine to send
      a packet to.  Currently the MPLS code supports addresses in AF_INET,
      AF_INET6 and AF_PACKET.  For AF_INET and AF_INET6 the destination mac
      address is acquired from the neighbour table.  For AF_PACKET the
      destination mac_address is specified in the netlink configuration.
      
      I think raw destination mac address support with the family AF_PACKET
      will prove useful.  There is MPLS-TP which is defined to operate
      on machines that do not support internet packets of any flavor.  Further
      seem to be corner cases where it can be useful.  At this point
      I don't care much either way.
      
      RTA_NEWDST specifies the destination address to forward the packet
      with.  MPLS typically changes it's destination address at every hop.
      For a swap operation RTA_NEWDST is specified with a length of one label.
      For a push operation RTA_NEWDST is specified with two or more labels.
      For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
      RTAN_NEWDST is specified.
      
      Those new netlink attributes are used to implement handling of rt-netlink
      RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
      MPLS label table.
      
      rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
      verify no unhandled attributes or unhandled values are present and sets
      up the data structures for mpls_route_add and mpls_route_del.
      
      I did my best to match up with the existing conventions with the caveats
      that MPLS addresses are all destination-specific-addresses, and so
      don't properly have a scope.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03c05665
  27. 20 1月, 2015 1 次提交
  28. 13 1月, 2015 1 次提交
  29. 06 1月, 2015 1 次提交
    • D
      net: tcp: add RTAX_CC_ALGO fib handling · ea697639
      Daniel Borkmann 提交于
      This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
      control metric to be set up and dumped back to user space.
      
      While the internal representation of RTAX_CC_ALGO is handled as a u32
      key, we avoided to expose this implementation detail to user space, thus
      instead, we chose the netlink attribute that is being exchanged between
      user space to be the actual congestion control algorithm name, similarly
      as in the setsockopt(2) API in order to allow for maximum flexibility,
      even for 3rd party modules.
      
      It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
      it should have been stored in RTAX_FEATURES instead, we first thought
      about reusing it for the congestion control key, but it brings more
      complications and/or confusion than worth it.
      
      Joint work with Florian Westphal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea697639
  30. 11 11月, 2014 1 次提交
  31. 20 6月, 2013 1 次提交
    • C
      tcp: introduce a per-route knob for quick ack · bcefe17c
      Cong Wang 提交于
      In previous discussions, I tried to find some reasonable heuristics
      for delayed ACK, however this seems not possible, according to Eric:
      
      	"ACKS might also be delayed because of bidirectional
      	traffic, and is more controlled by the application
      	response time. TCP stack can not easily estimate it."
      
      	"ACK can be incredibly useful to recover from losses in
      	a short time.
      
      	The vast majority of TCP sessions are small lived, and we
      	send one ACK per received segment anyway at beginning or
      	retransmits to let the sender smoothly increase its cwnd,
      	so an auto-tuning facility wont help them that much."
      
      and according to David:
      
      	"ACKs are the only information we have to detect loss.
      
      	And, for the same reasons that TCP VEGAS is fundamentally
      	broken, we cannot measure the pipe or some other
      	receiver-side-visible piece of information to determine
      	when it's "safe" to stretch ACK.
      
      	And even if it's "safe", we should not do it so that losses are
      	accurately detected and we don't spuriously retransmit.
      
      	The only way to know when the bandwidth increases is to
      	"test" it, by sending more and more packets until drops happen.
      	That's why all successful congestion control algorithms must
      	operate on explicited tested pieces of information.
      
      	Similarly, it's not really possible to universally know if
      	it's safe to stretch ACK or not."
      
      It still makes sense to enable or disable quick ack mode like
      what TCP_QUICK_ACK does.
      
      Similar to TCP_QUICK_ACK option, but for people who can't
      modify the source code and still wants to control
      TCP delayed ACK behavior. As David suggested, this should belong
      to per-path scope, since different pathes may want different
      behaviors.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Rick Jones <rick.jones2@hp.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      CC: David Laight <David.Laight@ACULAB.COM>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcefe17c
  32. 14 2月, 2013 1 次提交
  33. 13 12月, 2012 1 次提交
  34. 08 12月, 2012 1 次提交
    • C
      bridge: export multicast database via netlink · ee07c6e7
      Cong Wang 提交于
      V5: fix two bugs pointed out by Thomas
          remove seq check for now, mark it as TODO
      
      V4: remove some useless #include
          some coding style fix
      
      V3: drop debugging printk's
          update selinux perm table as well
      
      V2: drop patch 1/2, export ifindex directly
          Redesign netlink attributes
          Improve netlink seq check
          Handle IPv6 addr as well
      
      This patch exports bridge multicast database via netlink
      message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
      We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
      
      (Thanks to Thomas for patient reviews)
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee07c6e7
  35. 05 12月, 2012 1 次提交