1. 05 6月, 2015 1 次提交
  2. 26 5月, 2015 3 次提交
  3. 14 5月, 2015 3 次提交
  4. 04 5月, 2015 1 次提交
    • T
      ipv6: Flow label state ranges · 82a584b7
      Tom Herbert 提交于
      This patch divides the IPv6 flow label space into two ranges:
      0-7ffff is reserved for flow label manager, 80000-fffff will be
      used for creating auto flow labels (per RFC6438). This only affects how
      labels are set on transmit, it does not affect receive. This range split
      can be disbaled by systcl.
      
      Background:
      
      IPv6 flow labels have been an unmitigated disappointment thus far
      in the lifetime of IPv6. Support in HW devices to use them for ECMP
      is lacking, and OSes don't turn them on by default. If we had these
      we could get much better hashing in IPv6 networks without resorting
      to DPI, possibly eliminating some of the motivations to to define new
      encaps in UDP just for getting ECMP.
      
      Unfortunately, the initial specfications of IPv6 did not clarify
      how they are to be used. There has always been a vague concept that
      these can be used for ECMP, flow hashing, etc. and we do now have a
      good standard how to this in RFC6438. The problem is that flow labels
      can be either stateful or stateless (as in RFC6438), and we are
      presented with the possibility that a stateless label may collide
      with a stateful one.  Attempts to split the flow label space were
      rejected in IETF. When we added support in Linux for RFC6438, we
      could not turn on flow labels by default due to this conflict.
      
      This patch splits the flow label space and should give us
      a path to enabling auto flow labels by default for all IPv6 packets.
      This is an API change so we need to consider compatibility with
      existing deployment. The stateful range is chosen to be the lower
      values in hopes that most uses would have chosen small numbers.
      
      Once we resolve the stateless/stateful issue, we can proceed to
      look at enabling RFC6438 flow labels by default (starting with
      scaled testing).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82a584b7
  5. 08 4月, 2015 3 次提交
  6. 26 3月, 2015 1 次提交
  7. 19 3月, 2015 1 次提交
  8. 28 2月, 2015 1 次提交
  9. 10 2月, 2015 1 次提交
  10. 05 2月, 2015 1 次提交
    • E
      ipv6: fix sparse errors in ip6_make_flowlabel() · 67765146
      Eric Dumazet 提交于
      include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
      include/net/ipv6.h:713:22:    expected restricted __be32 [usertype] hash
      include/net/ipv6.h:713:22:    got unsigned int
      include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
      include/net/ipv6.h:719:22: warning: invalid assignment: ^=
      include/net/ipv6.h:719:22:    left side has type restricted __be32
      include/net/ipv6.h:719:22:    right side has type unsigned int
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67765146
  11. 04 2月, 2015 1 次提交
    • V
      ipv6: Select fragment id during UFO segmentation if not set. · 0508c07f
      Vlad Yasevich 提交于
      If the IPv6 fragment id has not been set and we perform
      fragmentation due to UFO, select a new fragment id.
      We now consider a fragment id of 0 as unset and if id selection
      process returns 0 (after all the pertrubations), we set it to
      0x80000000, thus giving us ample space not to create collisions
      with the next packet we may have to fragment.
      
      When doing UFO integrity checking, we also select the
      fragment id if it has not be set yet.   This is stored into
      the skb_shinfo() thus allowing UFO to function correclty.
      
      This patch also removes duplicate fragment id generation code
      and moves ipv6_select_ident() into the header as it may be
      used during GSO.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0508c07f
  12. 03 2月, 2015 1 次提交
    • V
      ipv6: introduce ipv6_make_skb · 6422398c
      Vlad Yasevich 提交于
      This commit is very similar to
      commit 1c32c5ad
      Author: Herbert Xu <herbert@gondor.apana.org.au>
      Date:   Tue Mar 1 02:36:47 2011 +0000
      
          inet: Add ip_make_skb and ip_finish_skb
      
      It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.
      
      The job of ip6_make_skb is to collect messages into an ipv6 packet
      and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
      the generated skb.  Together they replicated the job of
      ip6_push_pending_frames() while also provide the capability to be
      called independently.  This will be needed to add lockless UDP sendmsg
      support.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6422398c
  13. 31 10月, 2014 1 次提交
  14. 29 9月, 2014 1 次提交
  15. 05 9月, 2014 1 次提交
  16. 28 7月, 2014 2 次提交
  17. 17 7月, 2014 1 次提交
    • J
      net: clean up some sparse endianness warnings in ipv6.h · 1373a773
      Jeff Layton 提交于
      sparse is throwing warnings when building sunrpc modules due to some
      endianness shenanigans in ipv6.h. Specifically:
      
        CHECK   net/sunrpc/addr.c
      include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
      include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer
      include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
      include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer
      
      Sprinkle some endianness fixups to silence them. These should all get
      fixed up at compile time, so I don't think this will add any extra work
      to be done at runtime.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1373a773
  18. 09 7月, 2014 1 次提交
    • F
      net: provide stubs for ip6_set_txhash and ip6_make_flowlabel · a37934fc
      Florian Fainelli 提交于
      Commit cb1ce2ef ("ipv6: Implement automatic flow label generation
      on transmit") introduced ip6_make_flowlabel, while commit b73c3d0e
      ("net: Save TX flow hash in sock and set in skbuf on xmit") introduced
      ip6_set_txhash.
      
      ip6_set_tx_hash() uses sk_v6_daddr which references
      __sk_common.skc_v6_daddr from struct sock_common, which is gated with
      IS_ENABLED(CONFIG_IPV6).
      
      ip6_make_flowlabel() uses the ipv6 member from struct net which is
      also gated with IS_ENABLED(CONFIG_IPV6).
      
      When CONFIG_IPV6 is disabled, we will hit a build failure that looks
      like this when the compiler attempts inlining these functions:
      
        CC [M]  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.o
      In file included from include/net/inet_sock.h:27:0,
                       from include/net/ip.h:30,
                       from drivers/net/ethernet/broadcom/cnic.c:37:
      include/net/ipv6.h: In function 'ip6_set_txhash':
      include/net/sock.h:327:33: error: 'struct sock_common' has no member named 'skc_v6_daddr'
       #define sk_v6_daddr  __sk_common.skc_v6_daddr
                                       ^
      include/net/ipv6.h:696:49: note: in expansion of macro 'sk_v6_daddr'
        keys.dst = (__force __be32)ipv6_addr_hash(&sk->sk_v6_daddr);
                                                       ^
      In file included from include/net/inetpeer.h:15:0,
                       from include/net/route.h:28,
                       from include/net/ip.h:31,
                       from drivers/net/ethernet/broadcom/cnic.c:37:
      include/net/ipv6.h: In function 'ip6_make_flowlabel':
      include/net/ipv6.h:706:37: error: 'struct net' has no member named 'ipv6'
        if (!flowlabel && (autolabel || net->ipv6.sysctl.auto_flowlabels)) {
                                           ^
      
      Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit")
      Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a37934fc
  19. 08 7月, 2014 2 次提交
    • T
      ipv6: Implement automatic flow label generation on transmit · cb1ce2ef
      Tom Herbert 提交于
      Automatically generate flow labels for IPv6 packets on transmit.
      The flow label is computed based on skb_get_hash. The flow label will
      only automatically be set when it is zero otherwise (i.e. flow label
      manager hasn't set one). This supports the transmit side functionality
      of RFC 6438.
      
      Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
      system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
      functionality per socket.
      
      By default, auto flowlabels are disabled to avoid possible conflicts
      with flow label manager, however if this feature proves useful we
      may want to enable it by default.
      
      It should also be noted that FreeBSD has already implemented automatic
      flow labels (including the sysctl and socket option). In FreeBSD,
      automatic flow labels default to enabled.
      
      Performance impact:
      
      Running super_netperf with 200 flows for TCP_RR and UDP_RR for
      IPv6. Note that in UDP case, __skb_get_hash will be called for
      every packet with explains slight regression. In the TCP case
      the hash is saved in the socket so there is no regression.
      
      Automatic flow labels disabled:
      
        TCP_RR:
          86.53% CPU utilization
          127/195/322 90/95/99% latencies
          1.40498e+06 tps
      
        UDP_RR:
          90.70% CPU utilization
          118/168/243 90/95/99% latencies
          1.50309e+06 tps
      
      Automatic flow labels enabled:
      
        TCP_RR:
          85.90% CPU utilization
          128/199/337 90/95/99% latencies
          1.40051e+06
      
        UDP_RR
          92.61% CPU utilization
          115/164/236 90/95/99% latencies
          1.4687e+06
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb1ce2ef
    • T
      net: Save TX flow hash in sock and set in skbuf on xmit · b73c3d0e
      Tom Herbert 提交于
      For a connected socket we can precompute the flow hash for setting
      in skb->hash on output. This is a performance advantage over
      calculating the skb->hash for every packet on the connection. The
      computation is done using the common hash algorithm to be consistent
      with computations done for packets of the connection in other states
      where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).
      
      This patch adds sk_txhash to the sock structure. inet_set_txhash and
      ip6_set_txhash functions are added which are called from points in
      TCP and UDP where socket moves to established state.
      
      skb_set_hash_from_sk is a function which sets skb->hash from the
      sock txhash value. This is called in UDP and TCP transmit path when
      transmitting within the context of a socket.
      
      Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
      interface (in this case skb_get_hash called on every TX packet to
      create a UDP source port).
      
      Before fix:
      
        95.02% CPU utilization
        154/256/505 90/95/99% latencies
        1.13042e+06 tps
      
        Time in functions:
          0.28% skb_flow_dissect
          0.21% __skb_get_hash
      
      After fix:
      
        94.95% CPU utilization
        156/254/485 90/95/99% latencies
        1.15447e+06
      
        Neither __skb_get_hash nor skb_flow_dissect appear in perf
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b73c3d0e
  20. 03 6月, 2014 1 次提交
    • E
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet 提交于
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f156a6
  21. 14 5月, 2014 1 次提交
    • L
      net: add a sysctl to reflect the fwmark on replies · e110861f
      Lorenzo Colitti 提交于
      Kernel-originated IP packets that have no user socket associated
      with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
      are emitted with a mark of zero. Add a sysctl to make them have
      the same mark as the packet they are replying to.
      
      This allows an administrator that wishes to do so to use
      mark-based routing, firewalling, etc. for these replies by
      marking the original packets inbound.
      
      Tested using user-mode linux:
       - ICMP/ICMPv6 echo replies and errors.
       - TCP RST packets (IPv4 and IPv6).
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e110861f
  22. 01 5月, 2014 1 次提交
  23. 16 4月, 2014 1 次提交
  24. 22 1月, 2014 1 次提交
  25. 20 1月, 2014 1 次提交
  26. 16 1月, 2014 1 次提交
  27. 02 1月, 2014 1 次提交
  28. 10 12月, 2013 2 次提交
  29. 06 12月, 2013 2 次提交
  30. 24 11月, 2013 1 次提交