1. 18 12月, 2013 14 次提交
    • T
      net: Add utility function to copy skb hash · 3df7a74e
      Tom Herbert 提交于
      Adds skb_copy_hash to copy rxhash and l4_rxhash from one skb to another.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3df7a74e
    • T
      net: Add function to set the rxhash · 09323cc4
      Tom Herbert 提交于
      The function skb_set_rxash was added for drivers to call to set
      the rxhash in an skb. The type of hash is also specified as
      a parameter (L2, L3, L4, or unknown type).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09323cc4
    • T
      net: Add utility functions to clear rxhash · 7539fadc
      Tom Herbert 提交于
      In several places 'skb->rxhash = 0' is being done to clear the
      rxhash value in an skb.  This does not clear l4_rxhash which could
      still be set so that the rxhash wouldn't be recalculated on subsequent
      call to skb_get_rxhash.  This patch adds an explict function to clear
      all the rxhash related information in the skb properly.
      
      skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
      l4_rxhash.
      
      Fixed up places where 'skb->rxhash = 0' was being called.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7539fadc
    • T
      net: Change skb_get_rxhash to skb_get_hash · 3958afa1
      Tom Herbert 提交于
      Changing name of function as part of making the hash in skbuff to be
      generic property, not just for receive path.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3958afa1
    • S
      bonding: add resend_igmp attribute netlink support · d8838de7
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_RESEND_IGMP to allow get/set of bonding parameter
      resend_igmp via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8838de7
    • S
      bonding: add xmit_hash_policy attribute netlink support · f70161c6
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_XMIT_HASH_POLICY to allow get/set of bonding parameter
      xmit_hash_policy via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f70161c6
    • S
      bonding: add fail_over_mac attribute netlink support · 89901972
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter
      fail_over_mac via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89901972
    • S
      bonding: add primary_select attribute netlink support · 8a41ae44
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter
      primary_select via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a41ae44
    • S
      bonding: add primary attribute netlink support · 0a98a0d1
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter
      primary via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a98a0d1
    • E
      tcp: refine TSO splits · d4589926
      Eric Dumazet 提交于
      While investigating performance problems on small RPC workloads,
      I noticed linux TCP stack was always splitting the last TSO skb
      into two parts (skbs). One being a multiple of MSS, and a small one
      with the Push flag. This split is done even if TCP_NODELAY is set,
      or if no small packet is in flight.
      
      Example with request/response of 4K/4K
      
      IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      
      We can avoid this split by including the Nagle tests at the right place.
      
      Note : If some NIC had trouble sending TSO packets with a partial
      last segment, we would have hit the problem in GRO/forwarding workload already.
      
      tcp_minshall_update() is moved to tcp_output.c and is updated as we might
      feed a TSO packet with a partial last segment.
      
      This patch tremendously improves performance, as the traffic now looks
      like :
      
      IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      
      Before :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280774
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           205719.049006 task-clock                #    9.278 CPUs utilized
               8,449,968 context-switches          #    0.041 M/sec
               1,935,997 CPU-migrations            #    0.009 M/sec
                 160,541 page-faults               #    0.780 K/sec
         548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
         455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
         272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
         166,091,460,030 instructions              #    0.30  insns per cycle
                                                   #    2.74  stalled cycles per insn [83.39%]
          29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
           1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]
      
            22.173517844 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   16851063           0.0
      IpExtOutOctets                  23878580777        0.0
      
      After patch :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280877
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           107496.071918 task-clock                #    4.847 CPUs utilized
               5,635,458 context-switches          #    0.052 M/sec
               1,374,707 CPU-migrations            #    0.013 M/sec
                 160,920 page-faults               #    0.001 M/sec
         281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
         228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
         142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
          95,227,712,566 instructions              #    0.34  insns per cycle
                                                   #    2.40  stalled cycles per insn [83.43%]
          16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
             874,252,952 branch-misses             #    5.39% of all branches         [83.37%]
      
            22.175821286 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   11239428           0.0
      IpExtOutOctets                  23595191035        0.0
      
      Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
      2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
      scheduler.
      
      Many thanks to Neal for review and ideas.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4589926
    • S
      net: remove dead code for add/del multiple · 477bb933
      stephen hemminger 提交于
      These function to manipulate multiple addresses are not used anywhere
      in current net-next tree. Some out of tree code maybe using these but
      too bad; they should submit their code upstream..
      
      Also, make __hw_addr_flush local since only used by dev_addr_lists.c
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      477bb933
    • S
      net: phy: provide phy_resume/phy_suspend helpers · 481b5d93
      Sebastian Hesselbarth 提交于
      This adds helper functions to resume and suspend a given phy_device
      by calling the corresponding driver callbacks if available.
      Signed-off-by: NSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      481b5d93
    • W
      sctp: Reorder 'struc association' members to reduce its size · be78cfcb
      wangweidong 提交于
      Members of 'struct association' are not in appropriate order to
      reuse compiler added padding on 64bit architectures. In this patch
      we reorder those struct members and help reduce the size of the
      structure from 2776 bytes to 2720 bytes on 64 bit architectures.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be78cfcb
    • F
      lib: introduce arch optimized hash library · 71ae8aac
      Francesco Fusco 提交于
      We introduce a new hashing library that is meant to be used in
      the contexts where speed is more important than uniformity of the
      hashed values. The hash library leverages architecture specific
      implementation to achieve high performance and fall backs to
      jhash() for the generic case.
      
      On Intel-based x86 architectures, the library can exploit the crc32l
      instruction, part of the Intel SSE4.2 instruction set, if the
      instruction is supported by the processor. This implementation
      is twice as fast as the jhash() implementation on an i7 processor.
      
      Additional architectures, such as Arm64 provide instructions for
      accelerating the computation of CRC, so they could be added as well
      in follow-up work.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71ae8aac
  2. 14 12月, 2013 10 次提交
  3. 13 12月, 2013 2 次提交
    • J
      net-gro: Prepare GRO stack for the upcoming tunneling support · 299603e8
      Jerry Chu 提交于
      This patch modifies the GRO stack to avoid the use of "network_header"
      and associated macros like ip_hdr() and ipv6_hdr() in order to allow
      an arbitary number of IP hdrs (v4 or v6) to be used in the
      encapsulation chain. This lays the foundation for various IP
      tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.
      
      With this patch, the GRO stack traversing now is mostly based on
      skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
      skb->network_header). As a result all but the top layer (i.e., the
      the transport layer) must have hdrs of the same length in order for
      a pkt to be considered for aggregation. Therefore when adding a new
      encap layer (e.g., for tunneling), one must check and skip flows
      (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
      different hdr length.
      
      Note that unlike the network header, the transport header can and
      will continue to be set by the GRO code since there will be at
      most one "transport layer" in the encap chain.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299603e8
    • V
      macvlan: Remove custom recieve and forward handlers · 2f6a1b66
      Vlad Yasevich 提交于
      Since now macvlan and macvtap use the same receive and
      forward handlers, we can remove them completely and use
      netif_rx and dev_forward_skb() directly.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f6a1b66
  4. 12 12月, 2013 2 次提交
    • J
      ipv6: router reachability probing · 7e980569
      Jiri Benc 提交于
      RFC 4191 states in 3.5:
      
         When a host avoids using any non-reachable router X and instead sends
         a data packet to another router Y, and the host would have used
         router X if router X were reachable, then the host SHOULD probe each
         such router X's reachability by sending a single Neighbor
         Solicitation to that router's address.  A host MUST NOT probe a
         router's reachability in the absence of useful traffic that the host
         would have sent to the router if it were reachable.  In any case,
         these probes MUST be rate-limited to no more than one per minute per
         router.
      
      Currently, when the neighbour corresponding to a router falls into
      NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
      value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
      should be probed with a single NS. The probe is ratelimited by the existing
      code. To better distinguish meanings of the failure values, rename
      RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e980569
    • N
      ipv4: fix wildcard search with inet_confirm_addr() · b601fa19
      Nicolas Dichtel 提交于
      Help of this function says: "in_dev: only on this interface, 0=any interface",
      but since commit 39a6d063 ("[NETNS]: Process inet_confirm_addr in the
      correct namespace."), the code supposes that it will never be NULL. This
      function is never called with in_dev == NULL, but it's exported and may be used
      by an external module.
      
      Because this patch restore the ability to call inet_confirm_addr() with in_dev
      == NULL, I partially revert the above commit, as suggested by Julian.
      
      CC: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b601fa19
  5. 11 12月, 2013 5 次提交
  6. 10 12月, 2013 7 次提交