1. 04 5月, 2018 5 次提交
    • J
      bpf: add faked "ending" subprog · 4cb3d99c
      Jiong Wang 提交于
      There are quite a few code snippet like the following in verifier:
      
             subprog_start = 0;
             if (env->subprog_cnt == cur_subprog + 1)
                     subprog_end = insn_cnt;
             else
                     subprog_end = env->subprog_info[cur_subprog + 1].start;
      
      The reason is there is no marker in subprog_info array to tell the end of
      it.
      
      We could resolve this issue by introducing a faked "ending" subprog.
      The special "ending" subprog is with "insn_cnt" as start offset, so it is
      serving as the end mark whenever we iterate over all subprogs.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4cb3d99c
    • J
      bpf: centre subprog information fields · 9c8105bd
      Jiong Wang 提交于
      It is better to centre all subprog information fields into one structure.
      This structure could later serve as function node in call graph.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c8105bd
    • J
      bpf: unify main prog and subprog · f910cefa
      Jiong Wang 提交于
      Currently, verifier treat main prog and subprog differently. All subprogs
      detected are kept in env->subprog_starts while main prog is not kept there.
      Instead, main prog is implicitly defined as the prog start at 0.
      
      There is actually no difference between main prog and subprog, it is better
      to unify them, and register all progs detected into env->subprog_starts.
      
      This could also help simplifying some code logic.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f910cefa
    • D
      bpf: implement ld_abs/ld_ind in native bpf · e0cea7ce
      Daniel Borkmann 提交于
      The main part of this work is to finally allow removal of LD_ABS
      and LD_IND from the BPF core by reimplementing them through native
      eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
      keeping them around in native eBPF caused way more trouble than
      actually worth it. To just list some of the security issues in
      the past:
      
        * fdfaf64e ("x86: bpf_jit: support negative offsets")
        * 35607b02 ("sparc: bpf_jit: fix loads from negative offsets")
        * e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
        * 07aee943 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
        * 6d59b7db ("bpf, s390x: do not reload skb pointers in non-skb context")
        * 87338c8e ("bpf, ppc64: do not reload skb pointers in non-skb context")
      
      For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
      these days due to their limitations and more efficient/flexible
      alternatives that have been developed over time such as direct
      packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
      register, the load happens in host endianness and its exception
      handling can yield unexpected behavior. The latter is explained
      in depth in f6b1b3bf ("bpf: fix subprog verifier bypass by
      div/mod by 0 exception") with similar cases of exceptions we had.
      In native eBPF more recent program types will disable LD_ABS/LD_IND
      altogether through may_access_skb() in verifier, and given the
      limitations in terms of exception handling, it's also disabled
      in programs that use BPF to BPF calls.
      
      In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
      to access packet data. It is not used in seccomp-BPF but programs
      that use it for socket filtering or reuseport for demuxing with
      cBPF. This is mostly relevant for applications that have not yet
      migrated to native eBPF.
      
      The main complexity and source of bugs in LD_ABS/LD_IND is coming
      from their implementation in the various JITs. Most of them keep
      the model around from cBPF times by implementing a fastpath written
      in asm. They use typically two from the BPF program hidden CPU
      registers for caching the skb's headlen (skb->len - skb->data_len)
      and skb->data. Throughout the JIT phase this requires to keep track
      whether LD_ABS/LD_IND are used and if so, the two registers need
      to be recached each time a BPF helper would change the underlying
      packet data in native eBPF case. At least in eBPF case, available
      CPU registers are rare and the additional exit path out of the
      asm written JIT helper makes it also inflexible since not all
      parts of the JITer are in control from plain C. A LD_ABS/LD_IND
      implementation in eBPF therefore allows to significantly reduce
      the complexity in JITs with comparable performance results for
      them, e.g.:
      
      test_bpf             tcpdump port 22             tcpdump complex
      x64      - before    15 21 10                    14 19  18
               - after      7 10 10                     7 10  15
      arm64    - before    40 91 92                    40 91 151
               - after     51 64 73                    51 62 113
      
      For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
      and cache the skb's headlen and data in the cBPF prologue. The
      BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
      used as a local temporary variable. This allows to shrink the
      image on x86_64 also for seccomp programs slightly since mapping
      to %rsi is not an ereg. In callee-saved R8 and R9 we now track
      skb data and headlen, respectively. For normal prologue emission
      in the JITs this does not add any extra instructions since R8, R9
      are pushed to stack in any case from eBPF side. cBPF uses the
      convert_bpf_ld_abs() emitter which probes the fast path inline
      already and falls back to bpf_skb_load_helper_{8,16,32}() helper
      relying on the cached skb data and headlen as well. R8 and R9
      never need to be reloaded due to bpf_helper_changes_pkt_data()
      since all skb access in cBPF is read-only. Then, for the case
      of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
      the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
      does neither cache skb data and headlen nor has an inlined fast
      path. The reason for the latter is that native eBPF does not have
      any extra registers available anyway, but even if there were, it
      avoids any reload of skb data and headlen in the first place.
      Additionally, for the negative offsets, we provide an alternative
      bpf_skb_load_bytes_relative() helper in eBPF which operates
      similarly as bpf_skb_load_bytes() and allows for more flexibility.
      Tested myself on x64, arm64, s390x, from Sandipan on ppc64.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e0cea7ce
    • B
      bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP · fbfc504a
      Björn Töpel 提交于
      The xskmap is yet another BPF map, very much inspired by
      dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
      adds AF_XDP sockets into the map, and by using the bpf_redirect_map
      helper, an XDP program can redirect XDP frames to an AF_XDP socket.
      
      Note that a socket that is bound to certain ifindex/queue index will
      *only* accept XDP frames from that netdev/queue index. If an XDP
      program tries to redirect from a netdev/queue index other than what
      the socket is bound to, the frame will not be received on the socket.
      
      A socket can reside in multiple maps.
      
      v3: Fixed race and simplified code.
      v2: Removed one indirection in map lookup.
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fbfc504a
  2. 30 4月, 2018 2 次提交
  3. 29 4月, 2018 5 次提交
    • Y
      bpf/verifier: improve register value range tracking with ARSH · 9cbe1f5a
      Yonghong Song 提交于
      When helpers like bpf_get_stack returns an int value
      and later on used for arithmetic computation, the LSH and ARSH
      operations are often required to get proper sign extension into
      64-bit. For example, without this patch:
          54: R0=inv(id=0,umax_value=800)
          54: (bf) r8 = r0
          55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
          55: (67) r8 <<= 32
          56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
          56: (c7) r8 s>>= 32
          57: R8=inv(id=0)
      With this patch:
          54: R0=inv(id=0,umax_value=800)
          54: (bf) r8 = r0
          55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
          55: (67) r8 <<= 32
          56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
          56: (c7) r8 s>>= 32
          57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
      With better range of "R8", later on when "R8" is added to other register,
      e.g., a map pointer or scalar-value register, the better register
      range can be derived and verifier failure may be avoided.
      
      In our later example,
          ......
          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
          if (usize < 0)
              return 0;
          ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
          ......
      Without improving ARSH value range tracking, the register representing
      "max_len - usize" will have smin_value equal to S64_MIN and will be
      rejected by verifier.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9cbe1f5a
    • Y
      bpf: remove never-hit branches in verifier adjust_scalar_min_max_vals · afbe1a5b
      Yonghong Song 提交于
      In verifier function adjust_scalar_min_max_vals,
      when src_known is false and the opcode is BPF_LSH/BPF_RSH,
      early return will happen in the function. So remove
      the branch in handling BPF_LSH/BPF_RSH when src_known is false.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      afbe1a5b
    • Y
      bpf/verifier: refine retval R0 state for bpf_get_stack helper · 849fa506
      Yonghong Song 提交于
      The special property of return values for helpers bpf_get_stack
      and bpf_probe_read_str are captured in verifier.
      Both helpers return a negative error code or
      a length, which is equal to or smaller than the buffer
      size argument. This additional information in the
      verifier can avoid the condition such as "retval > bufsize"
      in the bpf program. For example, for the code blow,
          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
          if (usize < 0 || usize > max_len)
              return 0;
      The verifier may have the following errors:
          52: (85) call bpf_get_stack#65
           R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
           R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
           R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
           R9_w=inv800 R10=fp0,call_-1
          53: (bf) r8 = r0
          54: (bf) r1 = r8
          55: (67) r1 <<= 32
          56: (bf) r2 = r1
          57: (77) r2 >>= 32
          58: (25) if r2 > 0x31f goto pc+33
           R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512,
                               umax_value=18446744069414584320,
                               var_off=(0x0; 0xffffffff00000000))
           R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff))
           R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
           R8=inv(id=0) R9=inv800 R10=fp0,call_-1
          59: (1f) r9 -= r8
          60: (c7) r1 s>>= 32
          61: (bf) r2 = r7
          62: (0f) r2 += r1
          math between map_value pointer and register with unbounded
          min value is not allowed
      The failure is due to llvm compiler optimization where register "r2",
      which is a copy of "r1", is tested for condition while later on "r1"
      is used for map_ptr operation. The verifier is not able to track such
      inst sequence effectively.
      
      Without the "usize > max_len" condition, there is no llvm optimization
      and the below generated code passed verifier:
          52: (85) call bpf_get_stack#65
           R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
           R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
           R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
           R9_w=inv800 R10=fp0,call_-1
          53: (b7) r1 = 0
          54: (bf) r8 = r0
          55: (67) r8 <<= 32
          56: (c7) r8 s>>= 32
          57: (6d) if r1 s> r8 goto pc+24
           R0=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
           R1=inv0 R6=ctx(id=0,off=0,imm=0)
           R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
           R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800
           R10=fp0,call_-1
          58: (bf) r2 = r7
          59: (0f) r2 += r8
          60: (1f) r9 -= r8
          61: (bf) r1 = r6
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      849fa506
    • Y
      bpf: add bpf_get_stack helper · c195651e
      Yonghong Song 提交于
      Currently, stackmap and bpf_get_stackid helper are provided
      for bpf program to get the stack trace. This approach has
      a limitation though. If two stack traces have the same hash,
      only one will get stored in the stackmap table,
      so some stack traces are missing from user perspective.
      
      This patch implements a new helper, bpf_get_stack, will
      send stack traces directly to bpf program. The bpf program
      is able to see all stack traces, and then can do in-kernel
      processing or send stack traces to user space through
      shared map or bpf_perf_event_output.
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c195651e
    • Y
      bpf: change prototype for stack_map_get_build_id_offset · 5f412632
      Yonghong Song 提交于
      This patch didn't incur functionality change. The function prototype
      got changed so that the same function can be reused later.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5f412632
  4. 27 4月, 2018 1 次提交
  5. 25 4月, 2018 1 次提交
    • P
      bpf: allow map helpers access to map values directly · d71962f3
      Paul Chaignon 提交于
      Helpers that expect ARG_PTR_TO_MAP_KEY and ARG_PTR_TO_MAP_VALUE can only
      access stack and packet memory.  Allow these helpers to directly access
      map values by passing registers of type PTR_TO_MAP_VALUE.
      
      This change removes the need for an extra copy to the stack when using a
      map value to perform a second map lookup, as in the following:
      
      struct bpf_map_def SEC("maps") infobyreq = {
          .type = BPF_MAP_TYPE_HASHMAP,
          .key_size = sizeof(struct request *),
          .value_size = sizeof(struct info_t),
          .max_entries = 1024,
      };
      struct bpf_map_def SEC("maps") counts = {
          .type = BPF_MAP_TYPE_HASHMAP,
          .key_size = sizeof(struct info_t),
          .value_size = sizeof(u64),
          .max_entries = 1024,
      };
      SEC("kprobe/blk_account_io_start")
      int bpf_blk_account_io_start(struct pt_regs *ctx)
      {
          struct info_t *info = bpf_map_lookup_elem(&infobyreq, &ctx->di);
          u64 *count = bpf_map_lookup_elem(&counts, info);
          (*count)++;
      }
      Signed-off-by: NPaul Chaignon <paul.chaignon@orange.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d71962f3
  6. 24 4月, 2018 3 次提交
    • J
      bpf: sockmap, fix double page_put on ENOMEM error in redirect path · 4fcfdfb8
      John Fastabend 提交于
      In the case where the socket memory boundary is hit the redirect
      path returns an ENOMEM error. However, before checking for this
      condition the redirect scatterlist buffer is setup with a valid
      page and length. This is never unwound so when the buffers are
      released latter in the error path we do a put_page() and clear
      the scatterlist fields. But, because the initial error happens
      before completing the scatterlist buffer we end up with both the
      original buffer and the redirect buffer pointing to the same page
      resulting in duplicate put_page() calls.
      
      To fix this simply move the initial configuration of the redirect
      scatterlist buffer below the sock memory check.
      
      Found this while running TCP_STREAM test with netperf using Cilium.
      
      Fixes: fa246693 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4fcfdfb8
    • J
      bpf: sockmap, sk_wait_event needed to handle blocking cases · e20f7334
      John Fastabend 提交于
      In the recvmsg handler we need to add a wait event to support the
      blocking use cases. Without this we return zero and may confuse
      user applications. In the wait event any data received on the
      sk either via sk_receive_queue or the psock ingress list will
      wake up the sock.
      
      Fixes: fa246693 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e20f7334
    • J
      bpf: sockmap, map_release does not hold refcnt for pinned maps · ba6b8de4
      John Fastabend 提交于
      Relying on map_release hook to decrement the reference counts when a
      map is removed only works if the map is not being pinned. In the
      pinned case the ref is decremented immediately and the BPF programs
      released. After this BPF programs may not be in-use which is not
      what the user would expect.
      
      This patch moves the release logic into bpf_map_put_uref() and brings
      sockmap in-line with how a similar case is handled in prog array maps.
      
      Fixes: 3d9e9526 ("bpf: sockmap, fix leaking maps with attached but not detached progs")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ba6b8de4
  7. 23 4月, 2018 1 次提交
  8. 21 4月, 2018 2 次提交
    • K
      fork: unconditionally clear stack on fork · e01e8063
      Kees Cook 提交于
      One of the classes of kernel stack content leaks[1] is exposing the
      contents of prior heap or stack contents when a new process stack is
      allocated.  Normally, those stacks are not zeroed, and the old contents
      remain in place.  In the face of stack content exposure flaws, those
      contents can leak to userspace.
      
      Fixing this will make the kernel no longer vulnerable to these flaws, as
      the stack will be wiped each time a stack is assigned to a new process.
      There's not a meaningful change in runtime performance; it almost looks
      like it provides a benefit.
      
      Performing back-to-back kernel builds before:
      	Run times: 157.86 157.09 158.90 160.94 160.80
      	Mean: 159.12
      	Std Dev: 1.54
      
      and after:
      	Run times: 159.31 157.34 156.71 158.15 160.81
      	Mean: 158.46
      	Std Dev: 1.46
      
      Instead of making this a build or runtime config, Andy Lutomirski
      recommended this just be enabled by default.
      
      [1] A noisy search for many kinds of stack content leaks can be seen here:
      https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
      
      I did some more with perf and cycle counts on running 100,000 execs of
      /bin/true.
      
      before:
      Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
      Mean:  221015379122.60
      Std Dev: 4662486552.47
      
      after:
      Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
      Mean:  217745009865.40
      Std Dev: 5935559279.99
      
      It continues to look like it's faster, though the deviation is rather
      wide, but I'm not sure what I could do that would be less noisy.  I'm
      open to ideas!
      
      Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e01e8063
    • J
      bpf: sockmap remove dead check · 6ab690aa
      Jann Horn 提交于
      Remove dead code that bails on `attr->value_size > KMALLOC_MAX_SIZE` - the
      previous check already bails on `attr->value_size != 4`.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6ab690aa
  9. 20 4月, 2018 7 次提交
    • M
      bpf: btf: Add pretty print support to the basic arraymap · a26ca7c9
      Martin KaFai Lau 提交于
      This patch adds pretty print support to the basic arraymap.
      Support for other bpf maps can be added later.
      
      This patch adds new attrs to the BPF_MAP_CREATE command to allow
      specifying the btf_fd, btf_key_id and btf_value_id.  The
      BPF_MAP_CREATE can then associate the btf to the map if
      the creating map supports BTF.
      
      A BTF supported map needs to implement two new map ops,
      map_seq_show_elem() and map_check_btf().  This patch has
      implemented these new map ops for the basic arraymap.
      
      It also adds file_operations, bpffs_map_fops, to the pinned
      map such that the pinned map can be opened and read.
      After that, the user has an intuitive way to do
      "cat bpffs/pathto/a-pinned-map" instead of getting
      an error.
      
      bpffs_map_fops should not be extended further to support
      other operations.  Other operations (e.g. write/key-lookup...)
      should be realized by the userspace tools (e.g. bpftool) through
      the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
      Follow up patches will allow the userspace to obtain
      the BTF from a map-fd.
      
      Here is a sample output when reading a pinned arraymap
      with the following map's value:
      
      struct map_value {
      	int count_a;
      	int count_b;
      };
      
      cat /sys/fs/bpf/pinned_array_map:
      
      0: {1,2}
      1: {3,4}
      2: {5,6}
      ...
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a26ca7c9
    • M
      bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd · 60197cfb
      Martin KaFai Lau 提交于
      This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
      The original BTF data, which was used to create the BTF fd during
      the earlier BPF_BTF_LOAD call, will be returned.
      
      The userspace is expected to allocate buffer
      to info.info and the buffer size is set to info.info_len before
      calling BPF_OBJ_GET_INFO_BY_FD.
      
      The original BTF data is copied to the userspace buffer (info.info).
      Only upto the user's specified info.info_len will be copied.
      
      The original BTF data size is set to info.info_len.  The userspace
      needs to check if it is bigger than its allocated buffer size.
      If it is, the userspace should realloc with the kernel-returned
      info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      60197cfb
    • M
      bpf: btf: Add BPF_BTF_LOAD command · f56a653c
      Martin KaFai Lau 提交于
      This patch adds a BPF_BTF_LOAD command which
      1) loads and verifies the BTF (implemented in earlier patches)
      2) returns a BTF fd to userspace.  In the next patch, the
         BTF fd can be specified during BPF_MAP_CREATE.
      
      It currently limits to CAP_SYS_ADMIN.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f56a653c
    • M
      bpf: btf: Add pretty print capability for data with BTF type info · b00b8dae
      Martin KaFai Lau 提交于
      This patch adds pretty print capability for data with BTF type info.
      The current usage is to allow pretty print for a BPF map.
      
      The next few patches will allow a read() on a pinned map with BTF
      type info for its key and value.
      
      This patch uses the seq_printf() infra.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b00b8dae
    • M
      bpf: btf: Check members of struct/union · 179cde8c
      Martin KaFai Lau 提交于
      This patch checks a few things of struct's members:
      
      1) It has a valid size (e.g. a "const void" is invalid)
      2) A member's size (+ its member's offset) does not exceed
         the containing struct's size.
      3) The member's offset satisfies the alignment requirement
      
      The above can only be done after the needs_resolve member's type
      is resolved.  Hence, the above is done together in
      btf_struct_resolve().
      
      Each possible member's type (e.g. int, enum, modifier...) implements
      the check_member() ops which will be called from btf_struct_resolve().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      179cde8c
    • M
      bpf: btf: Validate type reference · eb3f595d
      Martin KaFai Lau 提交于
      After collecting all btf_type in the first pass in an earlier patch,
      the second pass (in this patch) can validate the reference types
      (e.g. the referring type does exist and it does not refer to itself).
      
      While checking the reference type, it also gathers other information (e.g.
      the size of an array).  This info will be useful in checking the
      struct's members in a later patch.  They will also be useful in doing
      pretty print later.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      eb3f595d
    • M
      bpf: btf: Introduce BPF Type Format (BTF) · 69b693f0
      Martin KaFai Lau 提交于
      This patch introduces BPF type Format (BTF).
      
      BTF (BPF Type Format) is the meta data format which describes
      the data types of BPF program/map.  Hence, it basically focus
      on the C programming language which the modern BPF is primary
      using.  The first use case is to provide a generic pretty print
      capability for a BPF map.
      
      BTF has its root from CTF (Compact C-Type format).  To simplify
      the handling of BTF data, BTF removes the differences between
      small and big type/struct-member.  Hence, BTF consistently uses u32
      instead of supporting both "one u16" and "two u32 (+padding)" in
      describing type and struct-member.
      
      It also raises the number of types (and functions) limit
      from 0x7fff to 0x7fffffff.
      
      Due to the above changes,  the format is not compatible to CTF.
      Hence, BTF starts with a new BTF_MAGIC and version number.
      
      This patch does the first verification pass to the BTF.  The first
      pass checks:
      1. meta-data size (e.g. It does not go beyond the total btf's size)
      2. name_offset is valid
      3. Each BTF_KIND (e.g. int, enum, struct....) does its
         own check of its meta-data.
      
      Some other checks, like checking a struct's member is referring
      to a valid type, can only be done in the second pass.  The second
      verification pass will be implemented in the next patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      69b693f0
  10. 19 4月, 2018 1 次提交
  11. 17 4月, 2018 12 次提交