• E
    tcp: auto corking · f54b3111
    Eric Dumazet 提交于
    With the introduction of TCP Small Queues, TSO auto sizing, and TCP
    pacing, we can implement Automatic Corking in the kernel, to help
    applications doing small write()/sendmsg() to TCP sockets.
    
    Idea is to change tcp_push() to check if the current skb payload is
    under skb optimal size (a multiple of MSS bytes)
    
    If under 'size_goal', and at least one packet is still in Qdisc or
    NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
    will be delayed up to TX completion time.
    
    This delay might allow the application to coalesce more bytes
    in the skb in following write()/sendmsg()/sendfile() system calls.
    
    The exact duration of the delay is depending on the dynamics
    of the system, and might be zero if no packet for this flow
    is actually held in Qdisc or NIC TX ring.
    
    Using FQ/pacing is a way to increase the probability of
    autocorking being triggered.
    
    Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
    this feature and default it to 1 (enabled)
    
    Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
    This counter is incremented every time we detected skb was under used
    and its flush was deferred.
    
    Tested:
    
    Interesting effects when using line buffered commands under ssh.
    
    Excellent performance results in term of cpu usage and total throughput.
    
    lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    9410.39
    
     Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
    
          35209.439626 task-clock                #    2.901 CPUs utilized
                 2,294 context-switches          #    0.065 K/sec
                   101 CPU-migrations            #    0.003 K/sec
                 4,079 page-faults               #    0.116 K/sec
        97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
        51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
        25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
       102,225,978,536 instructions              #    1.04  insns per cycle
                                                 #    0.51  stalled cycles per insn [83.38%]
        18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
            91,679,646 branch-misses             #    0.49% of all branches         [83.40%]
    
          12.136204899 seconds time elapsed
    
    lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    6624.89
    
     Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
          40045.864494 task-clock                #    3.301 CPUs utilized
                   171 context-switches          #    0.004 K/sec
                    53 CPU-migrations            #    0.001 K/sec
                 4,080 page-faults               #    0.102 K/sec
       111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
        61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
        29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
       108,654,349,355 instructions              #    0.98  insns per cycle
                                                 #    0.57  stalled cycles per insn [83.34%]
        19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
           157,875,417 branch-misses             #    0.81% of all branches         [83.34%]
    
          12.130267788 seconds time elapsed
    Signed-off-by: NEric Dumazet <edumazet@google.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    f54b3111
sysctl_net_ipv4.c 21.8 KB