1. 01 11月, 2019 1 次提交
  2. 16 7月, 2019 1 次提交
  3. 03 7月, 2019 2 次提交
    • S
      samples/bpf: add sample program that periodically dumps TCP stats · 39533884
      Stanislav Fomichev 提交于
      Uses new RTT callback to dump stats every second.
      
      $ mkdir -p /tmp/cgroupv2
      $ mount -t cgroup2 none /tmp/cgroupv2
      $ mkdir -p /tmp/cgroupv2/foo
      $ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
      $ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
      $ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
      $ bpftool prog tracelog
      $ # run neper/netperf/etc
      
      Used neper to compare performance with and without this program attached
      and didn't see any noticeable performance impact.
      
      Sample output:
        <idle>-0     [015] ..s.  2074.128800: 0: dsack_dups=0 delivered=242526
        <idle>-0     [015] ..s.  2074.128808: 0: delivered_ce=0 icsk_retransmits=0
        <idle>-0     [015] ..s.  2075.130133: 0: dsack_dups=0 delivered=323599
        <idle>-0     [015] ..s.  2075.130138: 0: delivered_ce=0 icsk_retransmits=0
        <idle>-0     [005] .Ns.  2076.131440: 0: dsack_dups=0 delivered=404648
        <idle>-0     [005] .Ns.  2076.131447: 0: delivered_ce=0 icsk_retransmits=0
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Priyaranjan Jha <priyarjha@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      39533884
    • B
      bpf: Add support for fq's EDT to HBM · 71634d7f
      brakmo 提交于
      Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth
      Manager). Includes a new BPF program supporting EDT, and also updates
      corresponding programs.
      
      It will drop packets with an EDT of more than 500us in the future
      unless the packet belongs to a flow with less than 2 packets in flight.
      This is done so each flow has at least 2 packets in flight, so they
      will not starve, and also to help prevent delayed ACK timeouts.
      
      It will also work with ECN enabled traffic, where the packets will be
      CE marked if their EDT is more than 50us in the future.
      
      The table below shows some performance numbers. The flows are back to
      back RPCS. One server sending to another, either 2 or 4 flows.
      One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more
      than one flow of a given RPC size, the numbers represent averages.
      
      The rate limit applies to all flows (they are in the same cgroup).
      Tests ending with "-edt" ran with the new BPF program supporting EDT.
      Tests ending with "-hbt" ran on top HBT qdisc with the specified rate
      (i.e. no HBM). The other tests ran with the HBM BPF program included
      in the HBM patch-set.
      
      EDT has limited value when using DCTCP, but it helps in many cases when
      using Cubic. It usually achieves larger link utilization and lower
      99% latencies for the 1MB RPCs.
      HBM ends up queueing a lot of packets with its default parameter values,
      reducing the goodput of the 10KB RPCs and increasing their latency. Also,
      the RTTs seen by the flows are quite large.
      
                               Aggr              10K  10K  10K   1MB  1MB  1MB
               Limit           rate drops  RTT  rate  P90  P99  rate  P90  P99
      Test      rate  Flows    Mbps   %     us  Mbps   us   us  Mbps   ms   ms
      --------  ----  -----    ---- -----  ---  ---- ---- ----  ---- ---- ----
      cubic       1G    2       904  0.02  108   257  511  539   647 13.4 24.5
      cubic-edt   1G    2       982  0.01  156   239  656  967   743 14.0 17.2
      dctcp       1G    2       977  0.00  105   324  408  744   653 14.5 15.9
      dctcp-edt   1G    2       981  0.01  142   321  417  811   660 15.7 17.0
      cubic-htb   1G    2       919  0.00 1825    40 2822 4140   879  9.7  9.9
      
      cubic     200M    2       155  0.30  220    81  532  655    74  283  450
      cubic-edt 200M    2       188  0.02  222    87 1035 1095   101   84   85
      dctcp     200M    2       188  0.03  111    77  912  939   111   76  325
      dctcp-edt 200M    2       188  0.03  217    74 1416 1738   114   76   79
      cubic-htb 200M    2       188  0.00 5015     8 14ms 15ms   180   48   50
      
      cubic       1G    4       952  0.03  110   165  516  546   262   38  154
      cubic-edt   1G    4       973  0.01  190   111 1034 1314   287   65   79
      dctcp       1G    4       951  0.00  103   180  617  905   257   37   38
      dctcp-edt   1G    4       967  0.00  163   151  732 1126   272   43   55
      cubic-htb   1G    4       914  0.00 3249    13  7ms  8ms   300   29   34
      
      cubic       5G    4      4236  0.00  134   305  490  624  1310   10   17
      cubic-edt   5G    4      4865  0.00  156   306  425  759  1520   10   16
      dctcp       5G    4      4936  0.00  128   485  221  409  1484    7    9
      dctcp-edt   5G    4      4924  0.00  148   390  392  623  1508   11   26
      
      v1 -> v2: Incorporated Andrii's suggestions
      v2 -> v3: Incorporated Yonghong's suggestions
      v3 -> v4: Removed credit update that is not needed
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      71634d7f
  4. 18 6月, 2019 1 次提交
  5. 15 6月, 2019 1 次提交
  6. 11 6月, 2019 1 次提交
  7. 06 6月, 2019 1 次提交
  8. 29 5月, 2019 1 次提交
  9. 28 3月, 2019 1 次提交
  10. 03 3月, 2019 2 次提交
    • B
      bpf: User program for testing HBM · a1270fe9
      brakmo 提交于
      The program nrm creates a cgroup and attaches a BPF program to the
      cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
      One still needs to create network traffic. This can be done through
      netesto, netperf or iperf3.
      A follow-up patch contains a script to create traffic.
      
      USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
                 [-w] [-h] [prog]
        Where:
         -d        Print BPF trace debug buffer
         -l        Also limit flows doing loopback
         -n <#>    To create cgroup "/hbm#" and attach prog. Default is /nrm1
                   This is convenient when testing HBM in more than 1 cgroup
         -r <rate> Rate limit in Mbps
         -s        Get HBM stats (marked, dropped, etc.)
         -t <time> Exit after specified seconds (deault is 0)
         -w        Work conserving flag. cgroup can increase its bandwidth
                   beyond the rate limit specified while there is available
                   bandwidth. Current implementation assumes there is only
                   NIC (eth0), but can be extended to support multiple NICs.
                   Currrently only supported for egress. Note, this is just
      	     a proof of concept.
         -h        Print this info
         prog      BPF program file name. Name defaults to hbm_out_kern.o
      
      More information about HBM can be found in the paper "BPF Host Resource
      Management" presented at the 2018 Linux Plumbers Conference, Networking Track
      (http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a1270fe9
    • B
      bpf: Sample HBM BPF program to limit egress bw · 187d0738
      brakmo 提交于
      A cgroup skb BPF program to limit cgroup output bandwidth.
      It uses a modified virtual token bucket queue to limit average
      egress bandwidth. The implementation uses credits instead of tokens.
      Negative credits imply that queueing would have happened (this is
      a virtual queue, so no queueing is done by it. However, queueing may
      occur at the actual qdisc (which is not used for rate limiting).
      
      This implementation uses 3 thresholds, one to start marking packets and
      the other two to drop packets:
                                       CREDIT
             - <--------------------------|------------------------> +
                   |    |          |      0
                   |  Large pkt    |
                   |  drop thresh  |
        Small pkt drop             Mark threshold
            thresh
      
      The effect of marking depends on the type of packet:
      a) If the packet is ECN enabled, then the packet is ECN ce marked.
         The current mark threshold is tuned for DCTCP.
      c) Else, it is dropped if it is a large packet.
      
      If the credit is below the drop threshold, the packet is dropped.
      Note that dropping a packet through the BPF program does not trigger CWR
      (Congestion Window Reduction) in TCP packets. A future patch will add
      support for triggering CWR.
      
      This BPF program actually uses 2 drop thresholds, one threshold
      for larger packets (>= 120 bytes) and another for smaller packets. This
      protects smaller packets such as SYNs, ACKs, etc.
      
      The default bandwidth limit is set at 1Gbps but this can be changed by
      a user program through a shared BPF map. In addition, by default this BPF
      program does not limit connections using loopback. This behavior can be
      overwritten by the user program. There is also an option to calculate
      some statistics, such as percent of packets marked or dropped, which
      the user program can access.
      
      A latter patch provides such a program (hbm.c)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      187d0738
  11. 01 3月, 2019 2 次提交
  12. 26 2月, 2019 1 次提交
  13. 02 2月, 2019 1 次提交
  14. 16 1月, 2019 1 次提交
    • Y
      samples/bpf: workaround clang asm goto compilation errors · 6bf3bbe1
      Yonghong Song 提交于
      x86 compilation has required asm goto support since 4.17.
      Since clang does not support asm goto, at 4.17,
      Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage
      for BPF compilation") worked around the issue by permitting an
      alternative implementation without asm goto for clang.
      
      At 5.0, more asm goto usages appeared.
        [yhs@148 x86]$ egrep -r asm_volatile_goto
        include/asm/cpufeature.h:     asm_volatile_goto("1: jmp 6f\n"
        include/asm/jump_label.h:     asm_volatile_goto("1:"
        include/asm/jump_label.h:     asm_volatile_goto("1:"
        include/asm/rmwcc.h:  asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"     \
        include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
        include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
        [yhs@148 x86]$
      
      Compiling samples/bpf directories, most bpf programs failed
      compilation with error messages like:
        In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2:
        In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6:
        In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15:
        In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5:
        In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5:
        In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9:
        In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72:
        /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet
              return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
                     ^
        /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc'
              __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc,           \
              ^
        /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc'
              asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"             \
              ^
        /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto'
        #define asm_volatile_goto(x...) asm goto(x)
      
      Most implementation does not even provide an alternative
      implementation. And it is also not practical to make changes
      for each call site.
      
      This patch workarounded the asm goto issue by redefining the macro like below:
        #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")
      
      If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad
      will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler
      will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes.
      
      With this patch, all bpf programs under samples/bpf can pass compilation.
      
      Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as
      they do not access kernel internal headers.
      
      Fixes: e769742d ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
      Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition")
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6bf3bbe1
  15. 23 12月, 2018 2 次提交
  16. 21 11月, 2018 1 次提交
  17. 01 9月, 2018 1 次提交
  18. 27 7月, 2018 3 次提交
  19. 18 7月, 2018 2 次提交
  20. 27 6月, 2018 1 次提交
  21. 25 5月, 2018 1 次提交
  22. 15 5月, 2018 3 次提交
  23. 14 5月, 2018 1 次提交
  24. 11 5月, 2018 4 次提交
  25. 04 5月, 2018 1 次提交
  26. 29 4月, 2018 1 次提交
  27. 27 4月, 2018 1 次提交
  28. 19 4月, 2018 1 次提交