1. 28 11月, 2019 1 次提交
  2. 26 11月, 2019 1 次提交
  3. 25 11月, 2019 15 次提交
  4. 20 11月, 2019 3 次提交
    • A
      selftests/bpf: Enforce no-ALU32 for test_progs-no_alu32 · 24f65050
      Andrii Nakryiko 提交于
      With the most recent Clang, alu32 is enabled by default if -mcpu=probe or
      -mcpu=v3 is specified. Use a separate build rule with -mcpu=v2 to enforce no
      ALU32 mode.
      Suggested-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20191120002510.4130605-1-andriin@fb.com
      24f65050
    • A
      libbpf: Fix call relocation offset calculation bug · a0d7da26
      Andrii Nakryiko 提交于
      When relocating subprogram call, libbpf doesn't take into account
      relo->text_off, which comes from symbol's value. This generally works fine for
      subprograms implemented as static functions, but breaks for global functions.
      
      Taking a simplified test_pkt_access.c as an example:
      
      __attribute__ ((noinline))
      static int test_pkt_access_subprog1(volatile struct __sk_buff *skb)
      {
              return skb->len * 2;
      }
      
      __attribute__ ((noinline))
      static int test_pkt_access_subprog2(int val, volatile struct __sk_buff *skb)
      {
              return skb->len + val;
      }
      
      SEC("classifier/test_pkt_access")
      int test_pkt_access(struct __sk_buff *skb)
      {
              if (test_pkt_access_subprog1(skb) != skb->len * 2)
                      return TC_ACT_SHOT;
              if (test_pkt_access_subprog2(2, skb) != skb->len + 2)
                      return TC_ACT_SHOT;
              return TC_ACT_UNSPEC;
      }
      
      When compiled, we get two relocations, pointing to '.text' symbol. .text has
      st_value set to 0 (it points to the beginning of .text section):
      
      0000000000000008  000000050000000a R_BPF_64_32            0000000000000000 .text
      0000000000000040  000000050000000a R_BPF_64_32            0000000000000000 .text
      
      test_pkt_access_subprog1 and test_pkt_access_subprog2 offsets (targets of two
      calls) are encoded within call instruction's imm32 part as -1 and 2,
      respectively:
      
      0000000000000000 test_pkt_access_subprog1:
             0:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
             1:       64 00 00 00 01 00 00 00 w0 <<= 1
             2:       95 00 00 00 00 00 00 00 exit
      
      0000000000000018 test_pkt_access_subprog2:
             3:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
             4:       04 00 00 00 02 00 00 00 w0 += 2
             5:       95 00 00 00 00 00 00 00 exit
      
      0000000000000000 test_pkt_access:
             0:       bf 16 00 00 00 00 00 00 r6 = r1
      ===>   1:       85 10 00 00 ff ff ff ff call -1
             2:       bc 01 00 00 00 00 00 00 w1 = w0
             3:       b4 00 00 00 02 00 00 00 w0 = 2
             4:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
             5:       64 02 00 00 01 00 00 00 w2 <<= 1
             6:       5e 21 08 00 00 00 00 00 if w1 != w2 goto +8 <LBB0_3>
             7:       bf 61 00 00 00 00 00 00 r1 = r6
      ===>   8:       85 10 00 00 02 00 00 00 call 2
             9:       bc 01 00 00 00 00 00 00 w1 = w0
            10:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
            11:       04 02 00 00 02 00 00 00 w2 += 2
            12:       b4 00 00 00 ff ff ff ff w0 = -1
            13:       1e 21 01 00 00 00 00 00 if w1 == w2 goto +1 <LBB0_3>
            14:       b4 00 00 00 02 00 00 00 w0 = 2
      0000000000000078 LBB0_3:
            15:       95 00 00 00 00 00 00 00 exit
      
      Now, if we compile example with global functions, the setup changes.
      Relocations are now against specifically test_pkt_access_subprog1 and
      test_pkt_access_subprog2 symbols, with test_pkt_access_subprog2 pointing 24
      bytes into its respective section (.text), i.e., 3 instructions in:
      
      0000000000000008  000000070000000a R_BPF_64_32            0000000000000000 test_pkt_access_subprog1
      0000000000000048  000000080000000a R_BPF_64_32            0000000000000018 test_pkt_access_subprog2
      
      Calls instructions now encode offsets relative to function symbols and are both
      set ot -1:
      
      0000000000000000 test_pkt_access_subprog1:
             0:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
             1:       64 00 00 00 01 00 00 00 w0 <<= 1
             2:       95 00 00 00 00 00 00 00 exit
      
      0000000000000018 test_pkt_access_subprog2:
             3:       61 20 00 00 00 00 00 00 r0 = *(u32 *)(r2 + 0)
             4:       0c 10 00 00 00 00 00 00 w0 += w1
             5:       95 00 00 00 00 00 00 00 exit
      
      0000000000000000 test_pkt_access:
             0:       bf 16 00 00 00 00 00 00 r6 = r1
      ===>   1:       85 10 00 00 ff ff ff ff call -1
             2:       bc 01 00 00 00 00 00 00 w1 = w0
             3:       b4 00 00 00 02 00 00 00 w0 = 2
             4:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
             5:       64 02 00 00 01 00 00 00 w2 <<= 1
             6:       5e 21 09 00 00 00 00 00 if w1 != w2 goto +9 <LBB2_3>
             7:       b4 01 00 00 02 00 00 00 w1 = 2
             8:       bf 62 00 00 00 00 00 00 r2 = r6
      ===>   9:       85 10 00 00 ff ff ff ff call -1
            10:       bc 01 00 00 00 00 00 00 w1 = w0
            11:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
            12:       04 02 00 00 02 00 00 00 w2 += 2
            13:       b4 00 00 00 ff ff ff ff w0 = -1
            14:       1e 21 01 00 00 00 00 00 if w1 == w2 goto +1 <LBB2_3>
            15:       b4 00 00 00 02 00 00 00 w0 = 2
      0000000000000080 LBB2_3:
            16:       95 00 00 00 00 00 00 00 exit
      
      Thus the right formula to calculate target call offset after relocation should
      take into account relocation's target symbol value (offset within section),
      call instruction's imm32 offset, and (subtracting, to get relative instruction
      offset) instruction index of call instruction itself. All that is shifted by
      number of instructions in main program, given all sub-programs are copied over
      after main program.
      
      Convert few selftests relying on bpf-to-bpf calls to use global functions
      instead of static ones.
      
      Fixes: 48cca7e4 ("libbpf: add support for bpf_call")
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191119224447.3781271-1-andriin@fb.com
      a0d7da26
    • L
      net-af_xdp: Use correct number of channels from ethtool · 3de88c91
      Luigi Rizzo 提交于
      Drivers use different fields to report the number of channels, so take
      the maximum of all data channels (rx, tx, combined) when determining the
      size of the xsk map. The current code used only 'combined' which was set
      to 0 in some drivers e.g. mlx4.
      
      Tested: compiled and run xdpsock -q 3 -r -S on mlx4
      Signed-off-by: NLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20191119001951.92930-1-lrizzo@google.com
      3de88c91
  5. 19 11月, 2019 8 次提交
  6. 18 11月, 2019 7 次提交
  7. 17 11月, 2019 1 次提交
    • E
      selftests: net: avoid ptl lock contention in tcp_mmap · 597b01ed
      Eric Dumazet 提交于
      tcp_mmap is used as a reference program for TCP rx zerocopy,
      so it is important to point out some potential issues.
      
      If multiple threads are concurrently using getsockopt(...
      TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
      functions compete on shared ptl lock, if vma are arbitrary placed.
      
      Instead of letting the mm layer place the chunks back to back,
      this patch enforces an alignment so that each thread uses
      a different ptl lock.
      
      Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
      launched at the same time :
      
      $ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done
      
      In the following run, we reproduce the old behavior by requesting no alignment :
      
      $ tcp_mmap -sz -C $((128*1024)) -a 4096
      received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
        cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
      received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
        cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
      received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
        cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
      received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
        cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
      received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
        cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
        cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
        cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
      received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
        cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches
      
      New behavior (automatic alignment based on Hugepagesize),
      we can see the system overhead being dramatically reduced.
      
      $ tcp_mmap -sz -C $((128*1024))
      received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
        cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
      received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
        cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
      received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
        cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
      received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
        cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
      received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
        cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
      received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
        cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
      received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
        cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
      received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
        cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Arjun Roy <arjunroy@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597b01ed
  8. 16 11月, 2019 4 次提交