1. 09 10月, 2018 5 次提交
  2. 06 10月, 2018 1 次提交
  3. 05 10月, 2018 4 次提交
  4. 03 10月, 2018 6 次提交
  5. 02 10月, 2018 5 次提交
  6. 30 9月, 2018 1 次提交
    • Y
      tcp: up initial rmem to 128KB and SYN rwin to around 64KB · a337531b
      Yuchung Cheng 提交于
      Previously TCP initial receive buffer is ~87KB by default and
      the initial receive window is ~29KB (20 MSS). This patch changes
      the two numbers to 128KB and ~64KB (rounding down to the multiples
      of MSS) respectively. The patch also simplifies the calculations s.t.
      the two numbers are directly controlled by sysctl tcp_rmem[1]:
      
        1) Initial receiver buffer budget (sk_rcvbuf): while this should
           be configured via sysctl tcp_rmem[1], previously tcp_fixup_rcvbuf()
           always override and set a larger size when a new connection
           establishes.
      
        2) Initial receive window in SYN: previously it is set to 20
           packets if MSS <= 1460. The number 20 was based on the initial
           congestion window of 10: the receiver needs twice amount to
           avoid being limited by the receive window upon out-of-order
           delivery in the first window burst. But since this only
           applies if the receiving MSS <= 1460, connection using large MTU
           (e.g. to utilize receiver zero-copy) may be limited by the
           receive window.
      
      With this patch TCP memory configuration is more straight-forward and
      more properly sized to modern high-speed networks by default. Several
      popular stacks have been announcing 64KB rwin in SYNs as well.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a337531b
  7. 27 9月, 2018 3 次提交
  8. 25 9月, 2018 1 次提交
  9. 22 9月, 2018 9 次提交
    • P
      net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net · 83619623
      Peter Oskolkov 提交于
      Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
      hard-limited to those of the root/init ns.
      
      There are at least two use cases when it would be desirable to
      set the high_thresh values higher in a child namespace vs the global hard
      limit:
      
      - a security/ddos protection policy may lower the thresholds in the
        root/init ns but allow for a special exception in a child namespace
      - testing: a test running in a namespace may want to set these
        thresholds higher in its namespace than what is in the root/init ns
      
      The new behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 9000000
      
      The old behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 4194304
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 4194304
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83619623
    • E
      net/ipv4: avoid compile error in fib_info_nh_uses_dev · 075e264f
      Eric Dumazet 提交于
      net/ipv4/fib_frontend.c: In function 'fib_info_nh_uses_dev':
      net/ipv4/fib_frontend.c:322:6: error: unused variable 'ret' [-Werror=unused-variable]
      cc1: all warnings being treated as errors
      
      Fixes: 78f2756c ("net/ipv4: Move device validation to helper")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      075e264f
    • E
      tcp: switch tcp_internal_pacing() to tcp_wstamp_ns · c092dd5f
      Eric Dumazet 提交于
      Now TCP keeps track of tcp_wstamp_ns, recording the earliest
      departure time of next packet, we can remove duplicate code
      from tcp_internal_pacing()
      
      This removes one ktime_get_tai_ns() call, and a divide.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c092dd5f
    • E
      tcp: switch tcp and sch_fq to new earliest departure time model · ab408b6d
      Eric Dumazet 提交于
      TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq
      no longer has to do it.
      
      Thanks to this model, TCP can get more accurate RTT samples,
      since pacing no longer inflates them.
      
      This has the nice effect of removing some delays caused by FQ
      quantum mechanism, causing inflated max/P99 latencies.
      
      Also we might relax TCP Small Queue tight limits in the future,
      since this new model allow TCP to build bigger batches, since
      sch_fq (or a device with earliest departure time offload) ensure
      these packets will be delivered on time.
      
      Note that other protocols are not converted (they will probably
      never be) so sch_fq has still support for SO_MAX_PACING_RATE
      
      Tested:
      
      Test showing FQ pacing quantum artifact for low-rate flows,
      adding unexpected throttles for RPC flows, inflating max and P99 latencies.
      
      The parameters chosen here are to show what happens typically when
      a TCP flow has a reduced pacing rate (this can be caused by a reduced
      cwin after few losses, or/and rtt above few ms)
      
      MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
      Before :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
       Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      19,82.78,5279,3825,482.02
      
      After :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
      Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      20,49.94,128,63,3.18
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab408b6d
    • E
      tcp: switch internal pacing timer to CLOCK_TAI · fd2bca2a
      Eric Dumazet 提交于
      Next patch will use tcp_wstamp_ns to feed internal
      TCP pacing timer, so switch to CLOCK_TAI to share same base.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2bca2a
    • E
      tcp: provide earliest departure time in skb->tstamp · d3edd06e
      Eric Dumazet 提交于
      Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
      from usec units to nsec units.
      
      Do not clear skb->tstamp before entering IP stacks in TX,
      so that qdisc or devices can implement pacing based on the
      earliest departure time instead of socket sk->sk_pacing_rate
      
      Packets are fed with tcp_wstamp_ns, and following patch
      will update tcp_wstamp_ns when both TCP and sch_fq switch to
      the earliest departure time mechanism.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3edd06e
    • E
      tcp: add tcp_wstamp_ns socket field · 9799ccb0
      Eric Dumazet 提交于
      TCP will soon provide earliest departure time on TX skbs.
      It needs to track this in a new variable.
      
      tcp_mstamp_refresh() needs to update this variable, and
      became too big to stay an inline.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9799ccb0
    • E
      tcp: introduce tcp_skb_timestamp_us() helper · 2fd66ffb
      Eric Dumazet 提交于
      There are few places where TCP reads skb->skb_mstamp expecting
      a value in usec unit.
      
      skb->tstamp (aka skb->skb_mstamp) will soon store CLOCK_TAI nsec value.
      
      Add tcp_skb_timestamp_us() to provide proper conversion when needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fd66ffb
    • Z
      ipv4: remove redundant null pointer check before kfree_skb · 1d08962f
      zhong jiang 提交于
      kfree_skb has taken the null pointer into account. hence it is safe
      to remove the redundant null pointer check before kfree_skb.
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d08962f
  10. 21 9月, 2018 3 次提交
  11. 18 9月, 2018 1 次提交
    • S
      net/ipv4: defensive cipso option parsing · 076ed3da
      Stefan Nuernberger 提交于
      commit 40413955 ("Cipso: cipso_v4_optptr enter infinite loop") fixed
      a possible infinite loop in the IP option parsing of CIPSO. The fix
      assumes that ip_options_compile filtered out all zero length options and
      that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
      While this assumption currently holds true, add explicit checks for zero
      length and invalid length options to be safe for the future. Even though
      ip_options_compile should have validated the options, the introduction of
      new one-byte options can still confuse this code without the additional
      checks.
      Signed-off-by: NStefan Nuernberger <snu@amazon.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Simon Veith <sveith@amazon.de>
      Cc: stable@vger.kernel.org
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      076ed3da
  12. 17 9月, 2018 1 次提交