1. 08 1月, 2014 1 次提交
    • B
      net: Do not enable tx-nocache-copy by default · cdb3f4a3
      Benjamin Poirier 提交于
      There are many cases where this feature does not improve performance or even
      reduces it.
      
      For example, here are the results from tests that I've run using 3.12.6 on one
      Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
      from the Xeon, but they're similar on the i7. All numbers report the
      mean±stddev over 10 runs of 10s.
      
      1) latency tests similar to what is described in "c6e1a0d1 net: Allow no-cache
      copy from user on transmit"
      There is no statistically significant difference between tx-nocache-copy
      on/off.
      nic irqs spread out (one queue per cpu)
      
      200x netperf -r 1400,1
      tx-nocache-copy off
              692000±1000 tps
              50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
      tx-nocache-copy on
              693000±1000 tps
              50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7
      
      200x netperf -r 14000,14000
      tx-nocache-copy off
              86450±80 tps
              50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
      tx-nocache-copy on
              86110±60 tps
              50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20
      
      2) single stream throughput tests
      tx-nocache-copy leads to higher service demand
      
                              throughput  cpu0        cpu1        demand
                              (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)
      
      nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
      
      tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
      tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004
      
      nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
      
      tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
      tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002
      
      As a second example, here are some results from Eric Dumazet with latest
      net-next.
      tx-nocache-copy also leads to higher service demand
      
      (cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             4282.648396 task-clock                #    0.423 CPUs utilized
                   9,348 context-switches          #    0.002 M/sec
                      88 CPU-migrations            #    0.021 K/sec
                     355 page-faults               #    0.083 K/sec
          11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
           9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
           4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
           6,053,172,792 instructions              #    0.51  insns per cycle
                                                   #    1.49  stalled cycles per insn [83.64%]
             597,275,583 branches                  #  139.464 M/sec                   [83.70%]
               8,960,541 branch-misses             #    1.50% of all branches         [83.65%]
      
            10.128990264 seconds time elapsed
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             2847.375441 task-clock                #    0.281 CPUs utilized
                  11,632 context-switches          #    0.004 M/sec
                      49 CPU-migrations            #    0.017 K/sec
                     354 page-faults               #    0.124 K/sec
           7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
           6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
           1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
           2,079,702,453 instructions              #    0.27  insns per cycle
                                                   #    2.94  stalled cycles per insn [83.22%]
             363,773,213 branches                  #  127.757 M/sec                   [83.29%]
               4,242,732 branch-misses             #    1.17% of all branches         [83.51%]
      
            10.128449949 seconds time elapsed
      
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdb3f4a3
  2. 07 1月, 2014 1 次提交
  3. 06 1月, 2014 1 次提交
  4. 04 1月, 2014 3 次提交
  5. 03 1月, 2014 1 次提交
  6. 02 1月, 2014 2 次提交
  7. 01 1月, 2014 2 次提交
    • Z
    • D
      vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a
      David S. Miller 提交于
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2205369a
  8. 22 12月, 2013 1 次提交
  9. 20 12月, 2013 1 次提交
  10. 18 12月, 2013 5 次提交
  11. 14 12月, 2013 1 次提交
  12. 13 12月, 2013 1 次提交
    • J
      net-gro: Prepare GRO stack for the upcoming tunneling support · 299603e8
      Jerry Chu 提交于
      This patch modifies the GRO stack to avoid the use of "network_header"
      and associated macros like ip_hdr() and ipv6_hdr() in order to allow
      an arbitary number of IP hdrs (v4 or v6) to be used in the
      encapsulation chain. This lays the foundation for various IP
      tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.
      
      With this patch, the GRO stack traversing now is mostly based on
      skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
      skb->network_header). As a result all but the top layer (i.e., the
      the transport layer) must have hdrs of the same length in order for
      a pkt to be considered for aggregation. Therefore when adding a new
      encap layer (e.g., for tunneling), one must check and skip flows
      (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
      different hdr length.
      
      Note that unlike the network header, the transport header can and
      will continue to be set by the GRO code since there will be at
      most one "transport layer" in the encap chain.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299603e8
  13. 12 12月, 2013 1 次提交
    • J
      ipv6: router reachability probing · 7e980569
      Jiri Benc 提交于
      RFC 4191 states in 3.5:
      
         When a host avoids using any non-reachable router X and instead sends
         a data packet to another router Y, and the host would have used
         router X if router X were reachable, then the host SHOULD probe each
         such router X's reachability by sending a single Neighbor
         Solicitation to that router's address.  A host MUST NOT probe a
         router's reachability in the absence of useful traffic that the host
         would have sent to the router if it were reachable.  In any case,
         these probes MUST be rate-limited to no more than one per minute per
         router.
      
      Currently, when the neighbour corresponding to a router falls into
      NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
      value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
      should be probed with a single NS. The probe is ratelimited by the existing
      code. To better distinguish meanings of the failure values, rename
      RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e980569
  14. 11 12月, 2013 3 次提交
  15. 10 12月, 2013 7 次提交
  16. 07 12月, 2013 2 次提交
  17. 06 12月, 2013 1 次提交
  18. 02 12月, 2013 1 次提交
    • F
      {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation · 3868204d
      fan.du 提交于
      commit a553e4a6 ("[PKTGEN]: IPSEC support")
      tried to support IPsec ESP transport transformation for pktgen, but acctually
      this doesn't work at all for two reasons(The orignal transformed packet has
      bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
      
      - After transpormation, IPv4 header total length needs update,
        because encrypted payload's length is NOT same as that of plain text.
      
      - After transformation, IPv4 checksum needs re-caculate because of payload
        has been changed.
      
      With this patch, armmed pktgen with below cofiguration, Wireshark is able to
      decrypted ESP packet generated by pktgen without any IPv4 checksum error or
      auth value error.
      
      pgset "flag IPSEC"
      pgset "flows 1"
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3868204d
  19. 22 11月, 2013 1 次提交
  20. 21 11月, 2013 2 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
    • V
      net: core: Always propagate flag changes to interfaces · d2615bf4
      Vlad Yasevich 提交于
      The following commit:
          b6c40d68
          net: only invoke dev->change_rx_flags when device is UP
      
      tried to fix a problem with VLAN devices and promiscuouse flag setting.
      The issue was that VLAN device was setting a flag on an interface that
      was down, thus resulting in bad promiscuity count.
      This commit blocked flag propagation to any device that is currently
      down.
      
      A later commit:
          deede2fa
          vlan: Don't propagate flag changes on down interfaces
      
      fixed VLAN code to only propagate flags when the VLAN interface is up,
      thus fixing the same issue as above, only localized to VLAN.
      
      The problem we have now is that if we have create a complex stack
      involving multiple software devices like bridges, bonds, and vlans,
      then it is possible that the flags would not propagate properly to
      the physical devices.  A simple examle of the scenario is the
      following:
      
        eth0----> bond0 ----> bridge0 ---> vlan50
      
      If bond0 or eth0 happen to be down at the time bond0 is added to
      the bridge, then eth0 will never have promisc mode set which is
      currently required for operation as part of the bridge.  As a
      result, packets with vlan50 will be dropped by the interface.
      
      The only 2 devices that implement the special flag handling are
      VLAN and DSA and they both have required code to prevent incorrect
      flag propagation.  As a result we can remove the generic solution
      introduced in b6c40d68 and leave
      it to the individual devices to decide whether they will block
      flag propagation or not.
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2615bf4
  21. 20 11月, 2013 2 次提交