1. 19 10月, 2019 3 次提交
  2. 18 10月, 2019 9 次提交
    • A
      Merge branch 'cleanup-selftests-bpf-makefile' · 47a92ae0
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      This patch set extensively revamps selftests/bpf's Makefile to generalize test
      runner concept and apply it uniformly to test_maps and test_progs test
      runners, along with test_progs' few build "flavors", exercising various ways
      to build BPF programs.
      
      As we do that, we fix dependencies between various phases of test runners, and
      simplify some one-off rules and dependencies currently present in Makefile.
      test_progs' flavors are now built into root $(OUTPUT) directory and can be run
      without any extra steps right from there. E.g., test_progs-alu32 is built and
      is supposed to be run from $(OUTPUT). It will cd into alu32/ subdirectory to
      load correct set of BPF object files (which are different from the ones built
      for test_progs).
      
      Outline:
      - patch #1 teaches test_progs about flavor sub-directories;
      - patch #2 fixes one of CO-RE tests to not depend strictly on process name;
      - patch #3 changes test_maps's usage of map_tests/tests.h to be the same as
        test_progs' one;
      - patch #4 adds convenient short `make test_progs`-like targets to build only
        individual tests, if necessary;
      - patch #5 is a main patch in the series; it uses a bunch of make magic
        (mainly $(call) and $(eval)) to define test runner "skeleton" and apply it
        to 4 different test runners, lots more details in corresponding commit
        description;
      - patch #6 does a bit of post-clean up for test_queue_map and test_stack_map
        BPF programs;
      - patch #7 cleans up test_libbpf.sh/test_libbpf_open superseded by test_progs.
      
      v3->v4:
      - remove accidentally checked in binaries;
      
      v2->v3:
      - drop test_xdp.o mixed compilation mode, remove test_libbpf.sh (Alexei);
      
      v1->v2:
      - drop test_progs-native causing compilation failures due to
        __builtin_preserve_field_access, add back test_xdp.o override, which will
        now emit rule re-definition warning.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      47a92ae0
    • A
      selftest/bpf: Remove test_libbpf.sh and test_libbpf_open · cb79a4e1
      Andrii Nakryiko 提交于
      test_progs is much more sophisticated superset of tests compared to
      test_libbpf.sh and test_libbpf_open. Remove test_libbpf.sh and
      test_libbpf_open.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-8-andriin@fb.com
      cb79a4e1
    • A
      selftests/bpf: Move test_queue_stack_map.h into progs/ where it belongs · 5ac93074
      Andrii Nakryiko 提交于
      test_queue_stack_map.h is used only from BPF programs. Thus it should be
      part of progs/ subdir. An added benefit of moving it there is that new
      TEST_RUNNER_DEFINE_RULES macro-rule will properly capture dependency on
      this header for all BPF objects and trigger re-build, if it changes.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-7-andriin@fb.com
      5ac93074
    • A
      selftests/bpf: Replace test_progs and test_maps w/ general rule · 74b5a596
      Andrii Nakryiko 提交于
      Define test runner generation meta-rule that codifies dependencies
      between test runner, its tests, and its dependent BPF programs. Use that
      for defining test_progs and test_maps test-runners. Also additionally define
      2 flavors of test_progs:
      - alu32, which builds BPF programs with 32-bit registers codegen;
      - bpf_gcc, which build BPF programs using GCC, if it supports BPF target.
      
      Overall, this is accomplished through $(eval)'ing a set of generic
      rules, which defines Makefile targets dynamically at runtime. See
      comments explaining the need for 2 $(evals), though.
      
      For each test runner we have (test_maps and test_progs, currently), and,
      optionally, their flavors, the logic of build process is modeled as
      follows (using test_progs as an example):
      - all BPF objects are in progs/:
        - BPF object's .o file is built into output directory from
          corresponding progs/.c file;
        - all BPF objects in progs/*.c depend on all progs/*.h headers;
        - all BPF objects depend on bpf_*.h helpers from libbpf (but not
          libbpf archive). There is an extra rule to trigger bpf_helper_defs.h
          (re-)build, if it's not present/outdated);
        - build recipe for BPF object can be re-defined per test runner/flavor;
      - test files are built from prog_tests/*.c:
        - all such test file objects are built on individual file basis;
        - currently, every single test file depends on all BPF object files;
          this might be improved in follow up patches to do 1-to-1 dependency,
          but allowing to customize this per each individual test;
        - each test runner definition can specify a list of extra .c and .h
          files to be built along test files and test runner binary; all such
          headers are becoming automatic dependency of each test .c file;
        - due to test files sometimes embedding (using .incbin assembly
          directive) contents of some BPF objects at compilation time, which are
          expected to be in CWD of compiler, compilation for test file object does
          cd into test runner's output directory; to support this mode all the
          include paths are turned into absolute paths using $(abspath) make
          function;
      - prog_tests/test.h is automatically (re-)generated with an entry for
        each .c file in prog_tests/;
      - final test runner binary is linked together from test object files and
        extra object files, linking together libbpf's archive as well;
      - it's possible to specify extra "resource" files/targets, which will be
        copied into test runner output directory, if it differes from
        Makefile-wide $(OUTPUT). This is used to ensure btf_dump test cases and
        urandom_read binary is put into a test runner's CWD for tests to find
        them in runtime.
      
      For flavored test runners, their output directory is a subdirectory of
      common Makefile-wide $(OUTPUT) directory with flavor name used as
      subdirectory name.
      
      BPF objects targets might be reused between different test runners, so
      extra checks are employed to not double-define them. Similarly, we have
      redefinition guards for output directories and test headers.
      
      test_verifier follows slightly different patterns and is simple enough
      to not justify generalizing TEST_RUNNER_DEFINE/TEST_RUNNER_DEFINE_RULES
      further to accomodate these differences. Instead, rules for
      test_verifier are minimized and simplified, while preserving correctness
      of dependencies.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-6-andriin@fb.com
      74b5a596
    • A
      selftests/bpf: Add simple per-test targets to Makefile · 03dcb784
      Andrii Nakryiko 提交于
      Currently it's impossible to do `make test_progs` and have only
      test_progs be built, because all the binary targets are defined in terms
      of $(OUTPUT)/<binary>, and $(OUTPUT) is absolute path to current
      directory (or whatever gets overridden to by user).
      
      This patch adds simple re-directing targets for all test targets making
      it possible to do simple and nice `make test_progs` (and any other
      target).
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-5-andriin@fb.com
      03dcb784
    • A
      selftests/bpf: Switch test_maps to test_progs' test.h format · ee6c52e9
      Andrii Nakryiko 提交于
      Make test_maps use tests.h header format consistent with the one used by
      test_progs, to facilitate unification.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-4-andriin@fb.com
      ee6c52e9
    • A
      selftests/bpf: Make CO-RE reloc test impartial to test_progs flavor · d25c5e23
      Andrii Nakryiko 提交于
      test_core_reloc_kernel test captures its own process name and validates
      it as part of the test. Given extra "flavors" of test_progs, this break
      for anything by default test_progs binary. Fix the test to cut out
      flavor part of the process name.
      
      Fixes: ee2eb063 ("selftests/bpf: Add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests")
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-3-andriin@fb.com
      d25c5e23
    • A
      selftests/bpf: Teach test_progs to cd into subdir · 0b6e71c3
      Andrii Nakryiko 提交于
      We are building a bunch of "flavors" of test_progs, e.g., w/ alu32 flag
      for Clang when building BPF object. test_progs setup is relying on
      having all the BPF object files and extra resources to be available in
      current working directory, though. But we actually build all these files
      into a separate sub-directory. Next set of patches establishes
      convention of naming "flavored" test_progs (and test runner binaries in
      general) as test_progs-flavor (e.g., test_progs-alu32), for each such
      extra flavor. This patch teaches test_progs binary to automatically
      detect its own extra flavor based on its argv[0], and if present, to
      change current directory to a flavor-specific subdirectory.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191016060051.2024182-2-andriin@fb.com
      0b6e71c3
    • J
      selftests/bpf: Restore the netns after flow dissector reattach test · 8d285a3b
      Jakub Sitnicki 提交于
      flow_dissector_reattach test changes the netns we run in but does not
      restore it to the one we started in when finished. This interferes with
      tests that run after it. Fix it by restoring the netns when done.
      
      Fixes: f97eea17 ("selftests/bpf: Check that flow dissector can be re-attached")
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Reported-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191017083752.30999-1-jakub@cloudflare.com
      8d285a3b
  3. 17 10月, 2019 13 次提交
    • D
      Merge branch 'bpf-btf-trace' · 0142fdc8
      Daniel Borkmann 提交于
      Alexei Starovoitov says:
      
      ====================
      v2->v3:
      - while trying to adopt btf-based tracing in production service realized
        that disabling bpf_probe_read() was premature. The real tracing program
        needs to see much more than this type safe tracking can provide.
        With these patches the verifier will be able to see that skb->data
        is a pointer to 'u8 *', but it cannot possibly know how many bytes
        of it is readable. Hence bpf_probe_read() is necessary to do basic
        packet reading from tracing program. Some helper can be introduced to
        solve this particular problem, but there are other similar structures.
        Another issue is bitfield reading. The support for bitfields
        is coming to llvm. libbpf will be supporting it eventually as well,
        but there will be corner cases where bpf_probe_read() is necessary.
        The long term goal is still the same: get rid of probe_read eventually.
      - fixed build issue with clang reported by Nathan Chancellor.
      - addressed a ton of comments from Andrii.
        bitfields and arrays are explicitly unsupported in btf-based tracking.
        This will be improved in the future.
        Right now the verifier is more strict than necessary.
        In some cases it can fall back to 'scalar' instead of rejecting
        the program, but rejection today allows to make better decisions
        in the future.
      - adjusted testcase to demo bitfield and skb->data reading.
      
      v1->v2:
      - addressed feedback from Andrii and Eric. Thanks a lot for review!
      - added missing check at raw_tp attach time.
      - Andrii noticed that expected_attach_type cannot be reused.
        Had to introduce new field to bpf_attr.
      - cleaned up logging nicely by introducing bpf_log() helper.
      - rebased.
      
      Revolutionize bpf tracing and bpf C programming.
      C language allows any pointer to be typecasted to any other pointer
      or convert integer to a pointer.
      Though bpf verifier is operating at assembly level it has strict type
      checking for fixed number of types.
      Known types are defined in 'enum bpf_reg_type'.
      For example:
      PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys'
      PTR_TO_SOCKET is a pointer to 'struct bpf_sock',
      and so on.
      
      When it comes to bpf tracing there are no types to track.
      bpf+kprobe receives 'struct pt_regs' as input.
      bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values.
      It was up to bpf program to interpret these integers.
      Typical tracing program looks like:
      int bpf_prog(struct pt_regs *ctx)
      {
          struct net_device *dev;
          struct sk_buff *skb;
          int ifindex;
      
          skb = (struct sk_buff *) ctx->di;
          bpf_probe_read(&dev, sizeof(dev), &skb->dev);
          bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex);
      }
      Addressing mistakes will not be caught by C compiler or by the verifier.
      The program above could have typecasted ctx->si to skb and page faulted
      on every bpf_probe_read().
      bpf_probe_read() allows reading any address and suppresses page faults.
      Typical program has hundreds of bpf_probe_read() calls to walk
      kernel data structures.
      Not only tracing program would be slow, but there was always a risk
      that bpf_probe_read() would read mmio region of memory and cause
      unpredictable hw behavior.
      
      With introduction of Compile Once Run Everywhere technology in libbpf
      and in LLVM and BPF Type Format (BTF) the verifier is finally ready
      for the next step in program verification.
      Now it can use in-kernel BTF to type check bpf assembly code.
      
      Equivalent program will look like:
      struct trace_kfree_skb {
          struct sk_buff *skb;
          void *location;
      };
      SEC("raw_tracepoint/kfree_skb")
      int trace_kfree_skb(struct trace_kfree_skb* ctx)
      {
          struct sk_buff *skb = ctx->skb;
          struct net_device *dev;
          int ifindex;
      
          __builtin_preserve_access_index(({
              dev = skb->dev;
              ifindex = dev->ifindex;
          }));
      }
      
      These patches teach bpf verifier to recognize kfree_skb's first argument
      as 'struct sk_buff *' because this is what kernel C code is doing.
      The bpf program cannot 'cheat' and say that the first argument
      to kfree_skb raw_tracepoint is some other type.
      The verifier will catch such type mismatch between bpf program
      assumption of kernel code and the actual type in the kernel.
      
      Furthermore skb->dev access is type tracked as well.
      The verifier can see which field of skb is being read
      in bpf assembly. It will match offset to type.
      If bpf program has code:
      struct net_device *dev = (void *)skb->len;
      C compiler will not complain and generate bpf assembly code,
      but the verifier will recognize that integer 'len' field
      is being accessed at offsetof(struct sk_buff, len) and will reject
      further dereference of 'dev' variable because it contains
      integer value instead of a pointer.
      
      Such sophisticated type tracking allows calling networking
      bpf helpers from tracing programs.
      This patchset allows calling bpf_skb_event_output() that dumps
      skb data into perf ring buffer.
      It greatly improves observability.
      Now users can not only see packet lenth of the skb
      about to be freed in kfree_skb() kernel function, but can
      dump it to user space via perf ring buffer using bpf helper
      that was previously available only to TC and socket filters.
      See patch 10 for full example.
      
      The end result is safer and faster bpf tracing.
      Safer - because type safe direct load can be used most of the time
      instead of bpf_probe_read().
      Faster - because direct loads are used to walk kernel data structures
      instead of bpf_probe_read() calls.
      Note that such loads can page fault and are supported by
      hidden bpf_probe_read() in interpreter and via exception table
      if program is JITed.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0142fdc8
    • A
      selftests/bpf: Add kfree_skb raw_tp test · 580d656d
      Alexei Starovoitov 提交于
      Load basic cls_bpf program.
      Load raw_tracepoint program and attach to kfree_skb raw tracepoint.
      Trigger cls_bpf via prog_test_run.
      At the end of test_run kernel will call kfree_skb
      which will trigger trace_kfree_skb tracepoint.
      Which will call our raw_tracepoint program.
      Which will take that skb and will dump it into perf ring buffer.
      Check that user space received correct packet.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-12-ast@kernel.org
      580d656d
    • A
      bpf: Check types of arguments passed into helpers · a7658e1a
      Alexei Starovoitov 提交于
      Introduce new helper that reuses existing skb perf_event output
      implementation, but can be called from raw_tracepoint programs
      that receive 'struct sk_buff *' as tracepoint argument or
      can walk other kernel data structures to skb pointer.
      
      In order to do that teach verifier to resolve true C types
      of bpf helpers into in-kernel BTF ids.
      The type of kernel pointer passed by raw tracepoint into bpf
      program will be tracked by the verifier all the way until
      it's passed into helper function.
      For example:
      kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
      bpf programs receives that skb pointer and may eventually
      pass it into bpf_skb_output() bpf helper which in-kernel is
      implemented via bpf_skb_event_output() kernel function.
      Its first argument in the kernel is 'struct sk_buff *'.
      The verifier makes sure that types match all the way.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
      a7658e1a
    • A
      bpf: Add support for BTF pointers to x86 JIT · 3dec541b
      Alexei Starovoitov 提交于
      Pointer to BTF object is a pointer to kernel object or NULL.
      Such pointers can only be used by BPF_LDX instructions.
      The verifier changed their opcode from LDX|MEM|size
      to LDX|PROBE_MEM|size to make JITing easier.
      The number of entries in extable is the number of BPF_LDX insns
      that access kernel memory via "pointer to BTF type".
      Only these load instructions can fault.
      Since x86 extable is relative it has to be allocated in the same
      memory region as JITed code.
      Allocate it prior to last pass of JITing and let the last pass populate it.
      Pointer to extable in bpf_prog_aux is necessary to make page fault
      handling fast.
      Page fault handling is done in two steps:
      1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
         It's done by walking rb tree.
      2. then extable for given bpf program is binary searched.
      This process is similar to how page faulting is done for kernel modules.
      The exception handler skips over faulting x86 instruction and
      initializes destination register with zero. This mimics exact
      behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).
      
      JITs for other architectures can add support in similar way.
      Until then they will reject unknown opcode and fallback to interpreter.
      
      Since extable should be aligned and placed near JITed code
      make bpf_jit_binary_alloc() return 4 byte aligned image offset,
      so that extable aligning formula in bpf_int_jit_compile() doesn't need
      to rely on internal implementation of bpf_jit_binary_alloc().
      On x86 gcc defaults to 16-byte alignment for regular kernel functions
      due to better performance. JITed code may be aligned to 16 in the future,
      but it will use 4 in the meantime.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
      3dec541b
    • A
      bpf: Add support for BTF pointers to interpreter · 2a02759e
      Alexei Starovoitov 提交于
      Pointer to BTF object is a pointer to kernel object or NULL.
      The memory access in the interpreter has to be done via probe_kernel_read
      to avoid page faults.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
      2a02759e
    • A
      bpf: Attach raw_tp program with BTF via type name · ac4414b5
      Alexei Starovoitov 提交于
      BTF type id specified at program load time has all
      necessary information to attach that program to raw tracepoint.
      Use kernel type name to find raw tracepoint.
      
      Add missing CHECK_ATTR() condition.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-8-ast@kernel.org
      ac4414b5
    • A
      bpf: Implement accurate raw_tp context access via BTF · 9e15db66
      Alexei Starovoitov 提交于
      libbpf analyzes bpf C program, searches in-kernel BTF for given type name
      and stores it into expected_attach_type.
      The kernel verifier expects this btf_id to point to something like:
      typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
      which represents signature of raw_tracepoint "kfree_skb".
      
      Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
      and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
      In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
      and 'void *' in second case.
      
      Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
      Like PTR_TO_SOCKET points to 'struct bpf_sock',
      PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
      PTR_TO_BTF_ID points to in-kernel structs.
      If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
      then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
      
      When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
      the btf_struct_access() checks which field of 'struct sk_buff' is
      at offset 32. Checks that size of access matches type definition
      of the field and continues to track the dereferenced type.
      If that field was a pointer to 'struct net_device' the r2's type
      will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
      in vmlinux's BTF.
      
      Such verifier analysis prevents "cheating" in BPF C program.
      The program cannot cast arbitrary pointer to 'struct sk_buff *'
      and access it. C compiler would allow type cast, of course,
      but the verifier will notice type mismatch based on BPF assembly
      and in-kernel BTF.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
      9e15db66
    • A
      libbpf: Auto-detect btf_id of BTF-based raw_tracepoints · f75a697e
      Alexei Starovoitov 提交于
      It's a responsiblity of bpf program author to annotate the program
      with SEC("tp_btf/name") where "name" is a valid raw tracepoint.
      The libbpf will try to find "name" in vmlinux BTF and error out
      in case vmlinux BTF is not available or "name" is not found.
      If "name" is indeed a valid raw tracepoint then in-kernel BTF
      will have "btf_trace_##name" typedef that points to function
      prototype of that raw tracepoint. BTF description captures
      exact argument the kernel C code is passing into raw tracepoint.
      The kernel verifier will check the types while loading bpf program.
      
      libbpf keeps BTF type id in expected_attach_type, but since
      kernel ignores this attribute for tracing programs copy it
      into attach_btf_id attribute before loading.
      
      Later the kernel will use prog->attach_btf_id to select raw tracepoint
      during bpf_raw_tracepoint_open syscall command.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-6-ast@kernel.org
      f75a697e
    • A
      bpf: Add attach_btf_id attribute to program load · ccfe29eb
      Alexei Starovoitov 提交于
      Add attach_btf_id attribute to prog_load command.
      It's similar to existing expected_attach_type attribute which is
      used in several cgroup based program types.
      Unfortunately expected_attach_type is ignored for
      tracing programs and cannot be reused for new purpose.
      Hence introduce attach_btf_id to verify bpf programs against
      given in-kernel BTF type id at load time.
      It is strictly checked to be valid for raw_tp programs only.
      In a later patches it will become:
      btf_id == 0 semantics of existing raw_tp progs.
      btd_id > 0 raw_tp with BTF and additional type safety.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-5-ast@kernel.org
      ccfe29eb
    • A
      bpf: Process in-kernel BTF · 8580ac94
      Alexei Starovoitov 提交于
      If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
      for further use by the verifier.
      In-kernel BTF is trusted just like kallsyms and other build artifacts
      embedded into vmlinux.
      Yet run this BTF image through BTF verifier to make sure
      that it is valid and it wasn't mangled during the build.
      If in-kernel BTF is incorrect it means either gcc or pahole or kernel
      are buggy. In such case disallow loading BPF programs.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
      8580ac94
    • A
      bpf: Add typecast to bpf helpers to help BTF generation · 7c6a469e
      Alexei Starovoitov 提交于
      When pahole converts dwarf to btf it emits only used types.
      Wrap existing bpf helper functions into typedef and use it in
      typecast to make gcc emits this type into dwarf.
      Then pahole will convert it to btf.
      The "btf_#name_of_helper" types will be used to figure out
      types of arguments of bpf helpers.
      The generated code before and after is the same.
      Only dwarf and btf sections are different.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-3-ast@kernel.org
      7c6a469e
    • A
      bpf: Add typecast to raw_tracepoints to help BTF generation · e8c423fb
      Alexei Starovoitov 提交于
      When pahole converts dwarf to btf it emits only used types.
      Wrap existing __bpf_trace_##template() function into
      btf_trace_##template typedef and use it in type cast to
      make gcc emits this type into dwarf. Then pahole will convert it to btf.
      The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-2-ast@kernel.org
      e8c423fb
    • S
      bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() · eac9153f
      Song Liu 提交于
      bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
      deadlock on rq_lock():
      
      rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
      [...]
      Call Trace:
       try_to_wake_up+0x1ad/0x590
       wake_up_q+0x54/0x80
       rwsem_wake+0x8a/0xb0
       bpf_get_stack+0x13c/0x150
       bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
       bpf_overflow_handler+0x60/0x100
       __perf_event_overflow+0x4f/0xf0
       perf_swevent_overflow+0x99/0xc0
       ___perf_sw_event+0xe7/0x120
       __schedule+0x47d/0x620
       schedule+0x29/0x90
       futex_wait_queue_me+0xb9/0x110
       futex_wait+0x139/0x230
       do_futex+0x2ac/0xa50
       __x64_sys_futex+0x13c/0x180
       do_syscall_64+0x42/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This can be reproduced by:
      1. Start a multi-thread program that does parallel mmap() and malloc();
      2. taskset the program to 2 CPUs;
      3. Attach bpf program to trace_sched_switch and gather stackmap with
         build-id, e.g. with trace.py from bcc tools:
         trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch
      
      A sample reproducer is attached at the end.
      
      This could also trigger deadlock with other locks that are nested with
      rq_lock.
      
      Fix this by checking whether irqs are disabled. Since rq_lock and all
      other nested locks are irq safe, it is safe to do up_read() when irqs are
      not disable. If the irqs are disabled, postpone up_read() in irq_work.
      
      Fixes: 615755a7 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com
      
      Reproducer:
      ============================ 8< ============================
      
      char *filename;
      
      void *worker(void *p)
      {
              void *ptr;
              int fd;
              char *pptr;
      
              fd = open(filename, O_RDONLY);
              if (fd < 0)
                      return NULL;
              while (1) {
                      struct timespec ts = {0, 1000 + rand() % 2000};
      
                      ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                      usleep(1);
                      if (ptr == MAP_FAILED) {
                              printf("failed to mmap\n");
                              break;
                      }
                      munmap(ptr, 4096 * 64);
                      usleep(1);
                      pptr = malloc(1);
                      usleep(1);
                      pptr[0] = 1;
                      usleep(1);
                      free(pptr);
                      usleep(1);
                      nanosleep(&ts, NULL);
              }
              close(fd);
              return NULL;
      }
      
      int main(int argc, char *argv[])
      {
              void *ptr;
              int i;
              pthread_t threads[THREAD_COUNT];
      
              if (argc < 2)
                      return 0;
      
              filename = argv[1];
      
              for (i = 0; i < THREAD_COUNT; i++) {
                      if (pthread_create(threads + i, NULL, worker, NULL)) {
                              fprintf(stderr, "Error creating thread\n");
                              return 0;
                      }
              }
      
              for (i = 0; i < THREAD_COUNT; i++)
                      pthread_join(threads[i], NULL);
              return 0;
      }
      ============================ 8< ============================
      eac9153f
  4. 16 10月, 2019 15 次提交