1. 02 8月, 2020 3 次提交
  2. 01 8月, 2020 1 次提交
  3. 31 7月, 2020 11 次提交
  4. 29 7月, 2020 2 次提交
  5. 28 7月, 2020 6 次提交
  6. 26 7月, 2020 17 次提交
    • A
      Merge branch 'bpf_link-XDP' · 47960ad6
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      Following cgroup and netns examples, implement bpf_link support for XDP.
      
      The semantics is described in patch #2. Program and link attachments are
      mutually exclusive, in the sense that neither link can replace attached
      program nor program can replace attached link. Link can't replace attached
      link as well, as is the case for any other bpf_link implementation.
      
      Patch #1 refactors existing BPF program-based attachment API and centralizes
      high-level query/attach decisions in generic kernel code, while drivers are
      kept simple and are instructed with low-level decisions about attaching and
      detaching specific bpf_prog. This also makes QUERY command unnecessary, and
      patch #8 removes support for it from all kernel drivers. If that's a bad idea,
      we can drop that patch altogether.
      
      With refactoring in patch #1, adding bpf_xdp_link is completely transparent to
      drivers, they are still functioning at the level of "effective" bpf_prog, that
      should be called in XDP data path.
      
      Corresponding libbpf support for BPF XDP link is added in patch #5.
      
      v3->v4:
      - fix a compilation warning in one of drivers (Jakub);
      
      v2->v3:
      - fix build when CONFIG_BPF_SYSCALL=n (kernel test robot);
      
      v1->v2:
      - fix prog refcounting bug (David);
      - split dev_change_xdp_fd() changes into 2 patches (David);
      - add extack messages to all user-induced errors (David).
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      47960ad6
    • A
      bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands · e8407fde
      Andrii Nakryiko 提交于
      Now that BPF program/link management is centralized in generic net_device
      code, kernel code never queries program id from drivers, so
      XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
      
      This patch removes all the implementations of those commands in kernel, along
      the xdp_attachment_query().
      
      This patch was compile-tested on allyesconfig.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
      e8407fde
    • A
      selftests/bpf: Add BPF XDP link selftests · fe48230c
      Andrii Nakryiko 提交于
      Add selftest validating all the attachment logic around BPF XDP link. Test
      also link updates and get_obj_info() APIs.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-9-andriin@fb.com
      fe48230c
    • A
      libbpf: Add support for BPF XDP link · dc8698ca
      Andrii Nakryiko 提交于
      Sync UAPI header and add support for using bpf_link-based XDP attachment.
      Make xdp/ prog type set expected attach type. Kernel didn't enforce
      attach_type for XDP programs before, so there is no backwards compatiblity
      issues there.
      
      Also fix section_names selftest to recognize that xdp prog types now have
      expected attach type.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com
      dc8698ca
    • A
      bpf: Implement BPF XDP link-specific introspection APIs · c1931c97
      Andrii Nakryiko 提交于
      Implement XDP link-specific show_fdinfo and link_info to emit ifindex.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com
      c1931c97
    • A
      bpf, xdp: Implement LINK_UPDATE for BPF XDP link · 026a4c28
      Andrii Nakryiko 提交于
      Add support for LINK_UPDATE command for BPF XDP link to enable reliable
      replacement of underlying BPF program.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com
      026a4c28
    • A
      bpf, xdp: Add bpf_link-based XDP attachment API · aa8d3a71
      Andrii Nakryiko 提交于
      Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
      BPF_LINK_CREATE command.
      
      bpf_xdp_link is mutually exclusive with direct BPF program attachment,
      previous BPF program should be detached prior to attempting to create a new
      bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
      can't be replaced by other BPF program attachment or link attachment. It will
      be detached only when the last BPF link FD is closed.
      
      bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
      how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
      will become defunct, but won't be destroyed until last FD is closed.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
      aa8d3a71
    • A
      bpf, xdp: Extract common XDP program attachment logic · d4baa936
      Andrii Nakryiko 提交于
      Further refactor XDP attachment code. dev_change_xdp_fd() is split into two
      parts: getting bpf_progs from FDs and attachment logic, working with
      bpf_progs. This makes attachment  logic a bit more straightforward and
      prepares code for bpf_xdp_link inclusion, which will share the common logic.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com
      d4baa936
    • A
      bpf, xdp: Maintain info on attached XDP BPF programs in net_device · 7f0a8382
      Andrii Nakryiko 提交于
      Instead of delegating to drivers, maintain information about which BPF
      programs are attached in which XDP modes (generic/skb, driver, or hardware)
      locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.
      
      Such re-organization simplifies existing code already. But it also allows to
      further add bpf_link-based XDP attachments without drivers having to know
      about any of this at all, which seems like a good setup.
      XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
      install/uninstall active BPF program. All the higher-level concerns about
      prog/link interaction will be contained within generic driver-agnostic logic.
      
      All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
      It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
      when resetting installed programs. That seems unnecessary, plus most drivers
      don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
      should be enough of an indicator of what is required of driver to correctly
      reset active BPF program. dev_xdp_uninstall() is also generalized as an
      iteration over all three supported mode.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
      7f0a8382
    • A
      bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALL · 6cc7d1e8
      Andrii Nakryiko 提交于
      Similarly to bpf_prog, make bpf_link and related generic API available
      unconditionally to make it easier to have bpf_link support in various parts of
      the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
      6cc7d1e8
    • S
      bpf: Fix build on architectures with special bpf_user_pt_regs_t · 2b9b305f
      Song Liu 提交于
      Architectures like s390, powerpc, arm64, riscv have speical definition of
      bpf_user_pt_regs_t. So we need to cast the pointer before passing it to
      bpf_get_stack(). This is similar to bpf_get_stack_tp().
      
      Fixes: 03d42fd2d83f ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200724200503.3629591-1-songliubraving@fb.com
      2b9b305f
    • Y
      bpf/local_storage: Fix build without CONFIG_CGROUP · dfcdf0e9
      YiFei Zhu 提交于
      local_storage.o has its compile guard as CONFIG_BPF_SYSCALL, which
      does not imply that CONFIG_CGROUP is on. Including cgroup-internal.h
      when CONFIG_CGROUP is off cause a compilation failure.
      
      Fixes: f67cfc233706 ("bpf: Make cgroup storages shared between programs on the same cgroup")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200724211753.902969-1-zhuyifei1999@gmail.com
      dfcdf0e9
    • A
      Merge branch 'shared-cgroup-storage' · 36f72484
      Alexei Starovoitov 提交于
      YiFei Zhu says:
      
      ====================
      To access the storage in a CGROUP_STORAGE map, one uses
      bpf_get_local_storage helper, which is extremely fast due to its
      use of per-CPU variables. However, its whole code is built on
      the assumption that one map can only be used by one program at any
      time, and this prohibits any sharing of data between multiple
      programs using these maps, eliminating a lot of use cases, such
      as some per-cgroup configuration storage, written to by a
      setsockopt program and read by a cg_sock_addr program.
      
      Why not use other map types? The great part of CGROUP_STORAGE map
      is that it is isolated by different cgroups its attached to. When
      one program uses bpf_get_local_storage, even on the same map, it
      gets different storages if it were run as a result of attaching
      to different cgroups. The kernel manages the storages, simplifying
      BPF program or userspace. In theory, one could probably use other
      maps like array or hash to do the same thing, but it would be a
      major overhead / complexity. Userspace needs to know when a cgroup
      is being freed in order to free up a space in the replacement map.
      
      This patch set introduces a significant change to the semantics of
      CGROUP_STORAGE map type. Instead of each storage being tied to one
      single attachment, it is shared across different attachments to
      the same cgroup, and persists until either the map or the cgroup
      attached to is being freed.
      
      User may use u64 as the key to the map, and the result would be
      that the attach type become ignored during key comparison, and
      programs of different attach types will share the same storage if
      the cgroups they are attached to are the same.
      
      How could this break existing users?
      * Users that uses detach & reattach / program replacement as a
        shortcut to zeroing the storage. Since we need sharing between
        programs, we cannot zero the storage. Users that expect this
        behavior should either attach a program with a new map, or
        explicitly zero the map with a syscall.
      This case is dependent on undocumented implementation details,
      so the impact should be very minimal.
      
      Patch 1 introduces a test on the old expected behavior of the map
      type.
      
      Patch 2 introduces a test showing how two programs cannot share
      one such map.
      
      Patch 3 implements the change of semantics to the map.
      
      Patch 4 amends the new test such that it yields the behavior we
      expect from the change.
      
      Patch 5 documents the map type.
      
      Changes since RFC:
      * Clarify commit message in patch 3 such that it says the lifetime
        of the storage is ended at the freeing of the cgroup_bpf, rather
        than the cgroup itself.
      * Restored an -ENOMEM check in __cgroup_bpf_attach.
      * Update selftests for recent change in network_helpers API.
      
      Changes since v1:
      * s/CHECK_FAIL/CHECK/
      * s/bpf_prog_attach/bpf_program__attach_cgroup/
      * Moved test__start_subtest to test_cg_storage_multi.
      * Removed some redundant CHECK_FAIL where they are already CHECK-ed.
      
      Changes since v2:
      * Lock cgroup_mutex during map_free.
      * Publish new storages only if attach is successful, by tracking
        exactly which storages are reused in an array of bools.
      * Mention bpftool map dump showing a value of zero for attach_type
        in patch 3 commit message.
      
      Changes since v3:
      * Use a much simpler lookup and allocate-if-not-exist from the fact
        that cgroup_mutex is locked during attach.
      * Removed an unnecessary spinlock hold.
      
      Changes since v4:
      * Changed semantics so that if the key type is struct
        bpf_cgroup_storage_key the map retains isolation between different
        attach types. Sharing between different attach types only occur
        when key type is u64.
      * Adapted tests and docs for the above change.
      
      Changes since v5:
      * Removed redundant NULL check before bpf_link__destroy.
      * Free BPF object explicitly, after asserting that object failed to
        load, in the event that the object did not fail to load.
      * Rename variable in bpf_cgroup_storage_key_cmp for clarity.
      * Added a lot of information to Documentation, more or less copied
        from what Martin KaFai Lau wrote.
      ====================
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      36f72484
    • Y
      Documentation/bpf: Document CGROUP_STORAGE map type · 4e15f460
      YiFei Zhu 提交于
      The machanics and usage are not very straightforward. Given the
      changes it's better to document how it works and how to use it,
      rather than having to rely on the examples and implementation to
      infer what is going on.
      Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/b412edfbb05cb1077c9e2a36a981a54ee23fa8b3.1595565795.git.zhuyifei@google.com
      4e15f460
    • Y
      selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress · 3573f384
      YiFei Zhu 提交于
      This mirrors the original egress-only test. The cgroup_storage is
      now extended to have two packet counters, one for egress and one
      for ingress. We also extend to have two egress programs to test
      that egress will always share with other egress origrams in the
      same cgroup. The behavior of the counters are exactly the same as
      the original egress-only test.
      
      The test is split into two, one "isolated" test that when the key
      type is struct bpf_cgroup_storage_key, which contains the attach
      type, programs of different attach types will see different
      storages. The other, "shared" test that when the key type is u64,
      programs of different attach types will see the same storage if
      they are attached to the same cgroup.
      Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/c756f5f1521227b8e6e90a453299dda722d7324d.1595565795.git.zhuyifei@google.com
      3573f384
    • A
      Merge branch 'fix-bpf_get_stack-with-PEBS' · 90065c06
      Alexei Starovoitov 提交于
      Song Liu says:
      
      ====================
      Calling get_perf_callchain() on perf_events from PEBS entries may cause
      unwinder errors. To fix this issue, perf subsystem fetches callchain early,
      and marks perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
      Similar issue exists when BPF program calls get_perf_callchain() via
      helper functions. For more information about this issue, please refer to
      discussions in [1].
      
      This set fixes this issue with helper proto bpf_get_stackid_pe and
      bpf_get_stack_pe.
      
      [1] https://lore.kernel.org/lkml/ED7B9430-6489-4260-B3C5-9CFA2E3AA87A@fb.com/
      
      Changes v4 => v5:
      1. Return -EPROTO instead of -EINVAL on PERF_EVENT_IOC_SET_BPF errors.
         (Alexei)
      2. Let libbpf print a hint message when PERF_EVENT_IOC_SET_BPF returns
         -EPROTO. (Alexei)
      
      Changes v3 => v4:
      1. Fix error check logic in bpf_get_stackid_pe and bpf_get_stack_pe.
         (Alexei)
      2. Do not allow attaching BPF programs with bpf_get_stack|stackid to
         perf_event with precise_ip > 0, but not proper callchain. (Alexei)
      3. Add selftest get_stackid_cannot_attach.
      
      Changes v2 => v3:
      1. Fix handling of stackmap skip field. (Andrii)
      2. Simplify the code in a few places. (Andrii)
      
      Changes v1 => v2:
      1. Simplify the design and avoid introducing new helper function. (Andrii)
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      90065c06
    • Y
      bpf: Make cgroup storages shared between programs on the same cgroup · 7d9c3427
      YiFei Zhu 提交于
      This change comes in several parts:
      
      One, the restriction that the CGROUP_STORAGE map can only be used
      by one program is removed. This results in the removal of the field
      'aux' in struct bpf_cgroup_storage_map, and removal of relevant
      code associated with the field, and removal of now-noop functions
      bpf_free_cgroup_storage and bpf_cgroup_storage_release.
      
      Second, we permit a key of type u64 as the key to the map.
      Providing such a key type indicates that the map should ignore
      attach type when comparing map keys. However, for simplicity newly
      linked storage will still have the attach type at link time in
      its key struct. cgroup_storage_check_btf is adapted to accept
      u64 as the type of the key.
      
      Third, because the storages are now shared, the storages cannot
      be unconditionally freed on program detach. There could be two
      ways to solve this issue:
      * A. Reference count the usage of the storages, and free when the
           last program is detached.
      * B. Free only when the storage is impossible to be referred to
           again, i.e. when either the cgroup_bpf it is attached to, or
           the map itself, is freed.
      Option A has the side effect that, when the user detach and
      reattach a program, whether the program gets a fresh storage
      depends on whether there is another program attached using that
      storage. This could trigger races if the user is multi-threaded,
      and since nondeterminism in data races is evil, go with option B.
      
      The both the map and the cgroup_bpf now tracks their associated
      storages, and the storage unlink and free are removed from
      cgroup_bpf_detach and added to cgroup_bpf_release and
      cgroup_storage_map_free. The latter also new holds the cgroup_mutex
      to prevent any races with the former.
      
      Fourth, on attach, we reuse the old storage if the key already
      exists in the map, via cgroup_storage_lookup. If the storage
      does not exist yet, we create a new one, and publish it at the
      last step in the attach process. This does not create a race
      condition because for the whole attach the cgroup_mutex is held.
      We keep track of an array of new storages that was allocated
      and if the process fails only the new storages would get freed.
      Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com
      7d9c3427