1. 23 1月, 2020 3 次提交
    • A
      bpf: Introduce dynamic program extensions · be8704ff
      Alexei Starovoitov 提交于
      Introduce dynamic program extensions. The users can load additional BPF
      functions and replace global functions in previously loaded BPF programs while
      these programs are executing.
      
      Global functions are verified individually by the verifier based on their types only.
      Hence the global function in the new program which types match older function can
      safely replace that corresponding function.
      
      This new function/program is called 'an extension' of old program. At load time
      the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
      to be replaced. The BPF program type is derived from the target program into
      extension program. Technically bpf_verifier_ops is copied from target program.
      The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
      The extension program can call the same bpf helper functions as target program.
      Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
      types. The verifier allows only one level of replacement. Meaning that the
      extension program cannot recursively extend an extension. That also means that
      the maximum stack size is increasing from 512 to 1024 bytes and maximum
      function nesting level from 8 to 16. The programs don't always consume that
      much. The stack usage is determined by the number of on-stack variables used by
      the program. The verifier could have enforced 512 limit for combined original
      plus extension program, but it makes for difficult user experience. The main
      use case for extensions is to provide generic mechanism to plug external
      programs into policy program or function call chaining.
      
      BPF trampoline is used to track both fentry/fexit and program extensions
      because both are using the same nop slot at the beginning of every BPF
      function. Attaching fentry/fexit to a function that was replaced is not
      allowed. The opposite is true as well. Replacing a function that currently
      being analyzed with fentry/fexit is not allowed. The executable page allocated
      by BPF trampoline is not used by program extensions. This inefficiency will be
      optimized in future patches.
      
      Function by function verification of global function supports scalars and
      pointer to context only. Hence program extensions are supported for such class
      of global functions only. In the future the verifier will be extended with
      support to pointers to structures, arrays with sizes, etc.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
      be8704ff
    • C
      bpf, btf: Always output invariant hit in pahole DWARF to BTF transform · 2a67a6cc
      Chris Down 提交于
      When trying to compile with CONFIG_DEBUG_INFO_BTF enabled, I got this
      error:
      
          % make -s
          Failed to generate BTF for vmlinux
          Try to disable CONFIG_DEBUG_INFO_BTF
          make[3]: *** [vmlinux] Error 1
      
      Compiling again without -s shows the true error (that pahole is
      missing), but since this is fatal, we should show the error
      unconditionally on stderr as well, not silence it using the `info`
      function. With this patch:
      
          % make -s
          BTF: .tmp_vmlinux.btf: pahole (pahole) is not available
          Failed to generate BTF for vmlinux
          Try to disable CONFIG_DEBUG_INFO_BTF
          make[3]: *** [vmlinux] Error 1
      Signed-off-by: NChris Down <chris@chrisdown.name>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200122000110.GA310073@chrisdown.name
      2a67a6cc
    • D
      selftests/bpf: Build urandom_read with LDFLAGS and LDLIBS · 1222653c
      Daniel Díaz 提交于
      During cross-compilation, it was discovered that LDFLAGS and
      LDLIBS were not being used while building binaries, leading
      to defaults which were not necessarily correct.
      
      OpenEmbedded reported this kind of problem:
      
        ERROR: QA Issue: No GNU_HASH in the ELF binary [...], didn't pass LDFLAGS?
      Signed-off-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      1222653c
  2. 22 1月, 2020 4 次提交
  3. 21 1月, 2020 15 次提交
  4. 18 1月, 2020 4 次提交
  5. 17 1月, 2020 7 次提交
    • Y
      bpf: Remove set but not used variable 'first_key' · 81f2b572
      YueHaibing 提交于
      kernel/bpf/syscall.c: In function generic_map_lookup_batch:
      kernel/bpf/syscall.c:1339:7: warning: variable first_key set but not used [-Wunused-but-set-variable]
      
      It is never used, so remove it.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NBrian Vazquez <brianvv@google.com>
      Link: https://lore.kernel.org/bpf/20200116145300.59056-1-yuehaibing@huawei.com
      81f2b572
    • A
      Merge branch 'xdp_redirect-bulking' · ba926603
      Alexei Starovoitov 提交于
      Toke Høiland-Jørgensen says:
      
      ====================
      Since commit 96360004 ("xdp: Make devmap flush_list common for all map
      instances"), devmap flushing is a global operation instead of tied to a
      particular map. This means that with a bit of refactoring, we can finally fix
      the performance delta between the bpf_redirect_map() and bpf_redirect() helper
      functions, by introducing bulking for the latter as well.
      
      This series makes this change by moving the data structure used for the bulking
      into struct net_device itself, so we can access it even when there is not
      devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
      mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
      bpf_redirect_map():
      
                             Before:   After:
      1 CPU:
      bpf_redirect_map:      8.4 Mpps  8.4 Mpps  (no change)
      bpf_redirect:          5.0 Mpps  8.4 Mpps  (+68%)
      2 CPUs:
      bpf_redirect_map:     15.9 Mpps  16.1 Mpps  (+1% or ~no change)
      bpf_redirect:          9.5 Mpps  15.9 Mpps  (+67%)
      
      After this patch series, the only semantics different between the two variants
      of the bpf() helper (apart from the absence of a map argument, obviously) is
      that the _map() variant will return an error if passed an invalid map index,
      whereas the bpf_redirect() helper will succeed, but drop packets on
      xdp_do_redirect(). This is because the helper has no reference to the calling
      netdev, so unfortunately we can't do the ifindex lookup directly in the helper.
      
      Changelog:
      
      v3:
        - Switch two more fields to avoid a list_head spanning two cache lines
        - Include Jesper's tracepoint patch
        - Also rename xdp_do_flush_map()
        - Fix a few nits from Maciej
      
      v2:
        - Consolidate code paths and tracepoints for map and non-map redirect variants
          (Björn)
        - Add performance data for 2-CPU test (Jesper)
        - Move fields to avoid shifting cache lines in struct net_device (Eric)
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      ba926603
    • J
      devmap: Adjust tracepoint for map-less queue flush · 58aa94f9
      Jesper Dangaard Brouer 提交于
      Now that we don't have a reference to a devmap when flushing the device
      bulk queue, let's change the the devmap_xmit tracepoint to remote the
      map_id and map_index fields entirely. Rearrange the fields so 'drops' and
      'sent' stay in the same position in the tracepoint struct, to make it
      possible for the xdp_monitor utility to read both the old and the new
      format.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
      58aa94f9
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
    • T
      xdp: Move devmap bulk queue into struct net_device · 75ccae62
      Toke Høiland-Jørgensen 提交于
      Commit 96360004 ("xdp: Make devmap flush_list common for all map
      instances"), changed devmap flushing to be a global operation instead of a
      per-map operation. However, the queue structure used for bulking was still
      allocated as part of the containing map.
      
      This patch moves the devmap bulk queue into struct net_device. The
      motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
      which will be changed in a subsequent commit.  To avoid other fields of
      struct net_device moving to different cache lines, we also move a couple of
      other members around.
      
      We defer the actual allocation of the bulk queue structure until the
      NETDEV_REGISTER notification devmap.c. This makes it possible to check for
      ndo_xdp_xmit support before allocating the structure, which is not possible
      at the time struct net_device is allocated. However, we keep the freeing in
      free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
      
      Because of this change, we lose the reference back to the map that
      originated the redirect, so change the tracepoint to always return 0 as the
      map ID and index. Otherwise no functional change is intended with this
      patch.
      
      After this patch, the relevant part of struct net_device looks like this,
      according to pahole:
      
      	/* --- cacheline 14 boundary (896 bytes) --- */
      	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
      	unsigned int               num_tx_queues;        /*   904     4 */
      	unsigned int               real_num_tx_queues;   /*   908     4 */
      	struct Qdisc *             qdisc;                /*   912     8 */
      	unsigned int               tx_queue_len;         /*   920     4 */
      	spinlock_t                 tx_global_lock;       /*   924     4 */
      	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
      	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
      	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
      	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
      	/* --- cacheline 15 boundary (960 bytes) --- */
      	struct hlist_head  qdisc_hash[16];               /*   960   128 */
      	/* --- cacheline 17 boundary (1088 bytes) --- */
      	struct timer_list  watchdog_timer;               /*  1088    40 */
      
      	/* XXX last struct has 4 bytes of padding */
      
      	int                        watchdog_timeo;       /*  1128     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct list_head   todo_list;                    /*  1136    16 */
      	/* --- cacheline 18 boundary (1152 bytes) --- */
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
      75ccae62
    • A
      libbpf: Revert bpf_helper_defs.h inclusion regression · 20f21d98
      Andrii Nakryiko 提交于
      Revert bpf_helpers.h's change to include auto-generated bpf_helper_defs.h
      through <> instead of "", which causes it to be searched in include path. This
      can break existing applications that don't have their include path pointing
      directly to where libbpf installs its headers.
      
      There is ongoing work to make all (not just bpf_helper_defs.h) includes more
      consistent across libbpf and its consumers, but this unbreaks user code as is
      right now without any regressions. Selftests still behave sub-optimally
      (taking bpf_helper_defs.h from libbpf's source directory, if it's present
      there), which will be fixed in subsequent patches.
      
      Fixes: 6910d7d3 ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
      Reported-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200117004103.148068-1-andriin@fb.com
      20f21d98
    • Y
      selftests/bpf: Fix test_progs send_signal flakiness with nmi mode · 35697c12
      Yonghong Song 提交于
      Alexei observed that test_progs send_signal may fail if run
      with command line "./test_progs" and the tests will pass
      if just run "./test_progs -n 40".
      
      I observed similar issue with nmi subtest failure
      and added a delay 100 us in Commit ab8b7f0c
      ("tools/bpf: Add self tests for bpf_send_signal_thread()")
      and the problem is gone for me. But the issue still exists
      in Alexei's testing environment.
      
      The current code uses sample_freq = 50 (50 events/second), which
      may not be enough. But if the sample_freq value is larger than
      sysctl kernel/perf_event_max_sample_rate, the perf_event_open
      syscall will fail.
      
      This patch changed nmi perf testing to use sample_period = 1,
      which means trying to sampling every event. This seems fixing
      the issue.
      
      Fixes: ab8b7f0c ("tools/bpf: Add self tests for bpf_send_signal_thread()")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200116174004.1522812-1-yhs@fb.com
      35697c12
  6. 16 1月, 2020 7 次提交
新手
引导
客服 返回
顶部