1. 03 6月, 2021 4 次提交
  2. 01 6月, 2021 1 次提交
  3. 29 5月, 2021 2 次提交
  4. 27 5月, 2021 1 次提交
  5. 26 5月, 2021 11 次提交
    • D
      Merge branch 'bpf-xdp-bcast' · aa7f1f03
      Daniel Borkmann 提交于
      Hangbin Liu says:
      
      ====================
      This patchset is a new implementation for XDP multicast support based
      on my previous 2 maps implementation[1]. The reason is that Daniel thinks
      the exclude map implementation is missing proper bond support in XDP
      context. And there is a plan to add native XDP bonding support. Adding
      a exclude map in the helper also increases the complexity of verifier and
      has drawbacks on performance.
      
      The new implementation just add two new flags BPF_F_BROADCAST and
      BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support.
      
      With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
      in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
      excluded when do broadcasting.
      
      The patchv11 link is here [2].
      
        [1] https://lore.kernel.org/bpf/20210223125809.1376577-1-liuhangbin@gmail.com
        [2] https://lore.kernel.org/bpf/20210513070447.1878448-1-liuhangbin@gmail.com
      
      v12: As Daniel pointed out:
        a) defined as const u64 for flag_mask and action_mask in
           __bpf_xdp_redirect_map()
        b) remove BPF_F_ACTION_MASK in uapi header
        c) remove EXPORT_SYMBOL_GPL for xdpf_clone()
      
      v11:
        a) Use unlikely() when checking if this is for broadcast redirecting.
        b) Fix a tracepoint NULL pointer issue Jesper found
        c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
           clear about what's flags we are using
        d) Add the performace number with multi veth interfaces in patch 01
           description.
        e) remove some sleeps to reduce the testing time in patch04. Re-struct the
           test and make clear what flags we are testing.
      
      v10: use READ/WRITE_ONCE when read/write map instead of xchg()
      v9: Update patch 01 commit description
      v8: use hlist_for_each_entry_rcu() when looping the devmap hash ojbs
      v7: No need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
      v6: Fix a skb leak in the error path for generic XDP
      v5: Just walk the map directly to get interfaces as get_next_key() of devmap
          hash may restart looping from the first key if the device get removed.
          After update the performace has improved 10% compired with v4.
      v4: Fix flags never cleared issue in patch 02. Update selftest to cover this.
      v3: Rebase the code based on latest bpf-next
      v2: fix flag renaming issue in patch 02
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aa7f1f03
    • H
      selftests/bpf: Add xdp_redirect_multi test · d2329247
      Hangbin Liu 提交于
      Add a bpf selftest for new helper xdp_redirect_map_multi(). In this
      test there are 3 forward groups and 1 exclude group. The test will
      redirect each interface's packets to all the interfaces in the forward
      group, and exclude the interface in exclude map.
      
      Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will
      be tested. XDP egress program will also be tested by setting pkt src MAC
      to egress interface's MAC address.
      
      For more test details, you can find it in the test script. Here is
      the test result.
      ]# time ./test_xdp_redirect_multi.sh
      Pass: xdpgeneric arp(F_BROADCAST) ns1-1
      Pass: xdpgeneric arp(F_BROADCAST) ns1-2
      Pass: xdpgeneric arp(F_BROADCAST) ns1-3
      Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
      Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
      Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
      Pass: xdpgeneric IPv6 (no flags) ns1-1
      Pass: xdpgeneric IPv6 (no flags) ns1-2
      Pass: xdpdrv arp(F_BROADCAST) ns1-1
      Pass: xdpdrv arp(F_BROADCAST) ns1-2
      Pass: xdpdrv arp(F_BROADCAST) ns1-3
      Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
      Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
      Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
      Pass: xdpdrv IPv6 (no flags) ns1-1
      Pass: xdpdrv IPv6 (no flags) ns1-2
      Pass: xdpegress mac ns1-2
      Pass: xdpegress mac ns1-3
      Summary: PASS 18, FAIL 0
      
      real    1m18.321s
      user    0m0.123s
      sys     0m0.350s
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-5-liuhangbin@gmail.com
      d2329247
    • H
      sample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test · e48cfe4b
      Hangbin Liu 提交于
      This is a sample for xdp redirect broadcast. In the sample we could forward
      all packets between given interfaces. There is also an option -X that could
      enable 2nd xdp_prog on egress interface.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-4-liuhangbin@gmail.com
      e48cfe4b
    • H
      xdp: Extend xdp_redirect_map with broadcast support · e624d4ed
      Hangbin Liu 提交于
      This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
      extend xdp_redirect_map for broadcast support.
      
      With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
      in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
      excluded when do broadcasting.
      
      When getting the devices in dev hash map via dev_map_hash_get_next_key(),
      there is a possibility that we fall back to the first key when a device
      was removed. This will duplicate packets on some interfaces. So just walk
      the whole buckets to avoid this issue. For dev array map, we also walk the
      whole map to find valid interfaces.
      
      Function bpf_clear_redirect_map() was removed in
      commit ee75aef2 ("bpf, xdp: Restructure redirect actions").
      Add it back as we need to use ri->map again.
      
      With test topology:
        +-------------------+             +-------------------+
        | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
        +-------------------+             |                   |
                                          |   Host B          |
        +-------------------+             |                   |
        | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
        +-------------------+             |                   |
                                          |          +------+ |
                                          | veth0 -- | Peer | |
                                          | veth1 -- |      | |
                                          | veth2 -- |  NS  | |
                                          |          +------+ |
                                          +-------------------+
      
      On Host A:
       # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
      
      On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
      Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
      All the veth peers in the NS have a XDP_DROP program loaded. The
      forward_map max_entries in xdp_redirect_map_multi is modify to 4.
      
      Testing the performance impact on the regular xdp_redirect path with and
      without patch (to check impact of additional check for broadcast mode):
      
      5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
      5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
      5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
      5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M
      
      Testing the performance when cloning packets with the redirect_map_multi
      test, using a redirect map size of 4, filled with 1-3 devices:
      
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
      e624d4ed
    • J
      bpf: Run devmap xdp_prog on flush instead of bulk enqueue · cb261b59
      Jesper Dangaard Brouer 提交于
      This changes the devmap XDP program support to run the program when the
      bulk queue is flushed instead of before the frame is enqueued. This has
      a couple of benefits:
      
      - It "sorts" the packets by destination devmap entry, and then runs the
        same BPF program on all the packets in sequence. This ensures that we
        keep the XDP program and destination device properties hot in I-cache.
      
      - It makes the multicast implementation simpler because it can just
        enqueue packets using bq_enqueue() without having to deal with the
        devmap program at all.
      
      The drawback is that if the devmap program drops the packet, the enqueue
      step is redundant. However, arguably this is mostly visible in a
      micro-benchmark, and with more mixed traffic the I-cache benefit should
      win out. The performance impact of just this patch is as follows:
      
      Using 2 10Gb i40e NIC, redirecting one to another, or into a veth interface,
      which do XDP_DROP on veth peer. With xdp_redirect_map in sample/bpf, send
      pkts via pktgen cmd:
      ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64
      
      There are about +/- 0.1M deviation for native testing, the performance
      improved for the base-case, but some drop back with xdp devmap prog attached.
      
      Version          | Test                           | Generic | Native | Native + 2nd xdp_prog
      5.12 rc4         | xdp_redirect_map   i40e->i40e  |    1.9M |   9.6M |  8.4M
      5.12 rc4         | xdp_redirect_map   i40e->veth  |    1.7M |  11.7M |  9.8M
      5.12 rc4 + patch | xdp_redirect_map   i40e->i40e  |    1.9M |   9.8M |  8.0M
      5.12 rc4 + patch | xdp_redirect_map   i40e->veth  |    1.7M |  12.0M |  9.4M
      
      When bq_xmit_all() is called from bq_enqueue(), another packet will
      always be enqueued immediately after, so clearing dev_rx, xdp_prog and
      flush_node in bq_xmit_all() is redundant. Move the clear to __dev_flush(),
      and only check them once in bq_enqueue() since they are all modified
      together.
      
      This change also has the side effect of extending the lifetime of the
      RCU-protected xdp_prog that lives inside the devmap entries: Instead of
      just living for the duration of the XDP program invocation, the
      reference now lives all the way until the bq is flushed. This is safe
      because the bq flush happens at the end of the NAPI poll loop, so
      everything happens between a local_bh_disable()/local_bh_enable() pair.
      However, this is by no means obvious from looking at the call sites; in
      particular, some drivers have an additional rcu_read_lock() around only
      the XDP program invocation, which only confuses matters further.
      Cleaning this up will be done in a separate patch series.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-2-liuhangbin@gmail.com
      cb261b59
    • A
      Merge branch 'libbpf: error reporting changes for v1.0' · 21703cf7
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      
      Implement error reporting changes discussed in "Libbpf: the road to v1.0"
      ([0]) document.
      
      Libbpf gets a new API, libbpf_set_strict_mode() which accepts a set of flags
      that turn on a set of libbpf 1.0 changes, that might be potentially breaking.
      It's possible to opt-in into all current and future 1.0 features by specifying
      LIBBPF_STRICT_ALL flag.
      
      When some of the 1.0 "features" are requested, libbpf APIs might behave
      differently. In this patch set a first set of changes are implemented, all
      related to the way libbpf returns errors. See individual patches for details.
      
      Patch #1 adds a no-op libbpf_set_strict_mode() functionality to enable
      updating selftests.
      
      Patch #2 gets rid of all the bad code patterns that will break in libbpf 1.0
      (exact -1 comparison for low-level APIs, direct IS_ERR() macro usage to check
      pointer-returning APIs for error, etc). These changes make selftest work in
      both legacy and 1.0 libbpf modes. Selftests also opt-in into 100% libbpf 1.0
      mode to automatically gain all the subsequent changes, which will come in
      follow up patches.
      
      Patch #3 streamlines error reporting for low-level APIs wrapping bpf() syscall.
      
      Patch #4 streamlines errors for all the rest APIs.
      
      Patch #5 ensures that BPF skeletons propagate errors properly as well, as
      currently on error some APIs will return NULL with no way of checking exact
      error code.
      
        [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY
      
      v1->v2:
        - move libbpf_set_strict_mode() implementation to patch #1, where it belongs
          (Alexei);
        - add acks, slight rewording of commit messages.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      21703cf7
    • A
      bpftool: Set errno on skeleton failures and propagate errors · 9c6c0449
      Andrii Nakryiko 提交于
      Follow libbpf's error handling conventions and pass through errors and errno
      properly. Skeleton code always returned NULL on errors (not ERR_PTR(err)), so
      there are no backwards compatibility concerns. But now we also set errno
      properly, so it's possible to distinguish different reasons for failure, if
      necessary.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210525035935.1461796-6-andrii@kernel.org
      9c6c0449
    • A
      libbpf: Streamline error reporting for high-level APIs · e9fc3ce9
      Andrii Nakryiko 提交于
      Implement changes to error reporting for high-level libbpf APIs to make them
      less surprising and less error-prone to users:
        - in all the cases when error happens, errno is set to an appropriate error
          value;
        - in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
          error code is communicated through errno; this applies both to APIs that
          already returned NULL before (so now they communicate more detailed error
          codes), as well as for many APIs that used ERR_PTR() macro and encoded
          error numbers as fake pointers.
        - in legacy (default) mode, those APIs that were returning ERR_PTR(err),
          continue doing so, but still set errno.
      
      With these changes, errno can be always used to extract actual error,
      regardless of legacy or libbpf 1.0 modes. This is utilized internally in
      libbpf in places where libbpf uses it's own high-level APIs.
      libbpf_get_error() is adapted to handle both cases completely transparently to
      end-users (and is used by libbpf consistently as well).
      
      More context, justification, and discussion can be found in "Libbpf: the road
      to v1.0" document ([0]).
      
        [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTYSigned-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
      e9fc3ce9
    • A
      libbpf: Streamline error reporting for low-level APIs · f12b6543
      Andrii Nakryiko 提交于
      Ensure that low-level APIs behave uniformly across the libbpf as follows:
        - in case of an error, errno is always set to the correct error code;
        - when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
          libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
        - by default, until libbpf 1.0 is released, keep returning -1 directly.
      
      More context, justification, and discussion can be found in "Libbpf: the road
      to v1.0" document ([0]).
      
        [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTYSigned-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org
      f12b6543
    • A
      selftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks · bad2e478
      Andrii Nakryiko 提交于
      Turn ony libbpf 1.0 mode. Fix all the explicit IS_ERR checks that now will be
      broken because libbpf returns NULL on error (and sets errno). Fix
      ASSERT_OK_PTR and ASSERT_ERR_PTR to work for both old mode and new modes and
      use them throughout selftests. This is trivial to do by using
      libbpf_get_error() API that all libbpf users are supposed to use, instead of
      IS_ERR checks.
      
      A bunch of checks also did explicit -1 comparison for various fd-returning
      APIs. Such checks are replaced with >= 0 or < 0 cases.
      
      There were also few misuses of bpf_object__find_map_by_name() in test_maps.
      Those are fixed in this patch as well.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210525035935.1461796-3-andrii@kernel.org
      bad2e478
    • A
      libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors · 5981881d
      Andrii Nakryiko 提交于
      Add libbpf_set_strict_mode() API that allows application to simulate libbpf
      1.0 breaking changes before libbpf 1.0 is released. This will help users
      migrate gradually and with confidence.
      
      For now only ALL or NONE options are available, subsequent patches will add
      more flags. This patch is preliminary for selftests/bpf changes.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org
      5981881d
  6. 25 5月, 2021 9 次提交
  7. 24 5月, 2021 9 次提交
  8. 22 5月, 2021 3 次提交