1. 29 3月, 2020 4 次提交
  2. 28 3月, 2020 8 次提交
  3. 27 3月, 2020 3 次提交
  4. 26 3月, 2020 6 次提交
    • J
      bpf: Test_verifier, #70 error message updates for 32-bit right shift · aa131ed4
      John Fastabend 提交于
      After changes to add update_reg_bounds after ALU ops and adding ALU32
      bounds tracking the error message is changed in the 32-bit right shift
      tests.
      
      Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL"
      now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      math between map_value pointer and 4294967294 is not allowed
      
      And test "#70/p bounds check after 32-bit right shift with 64-bit input
      FAIL" now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      last_idx 11 first_idx 0
      regs=2 stack=0 before 10: (14) w1 -= 2
      regs=2 stack=0 before 9: (74) w1 >>= 31
      regs=2 stack=0 before 8: (67) r1 <<= 31
      regs=2 stack=0 before 7: (b7) r1 = 2
      math between map_value pointer and 4294967294 is not allowed
      
      Before this series we did not trip the "math between map_value pointer..."
      error because check_reg_sane_offset is never called in
      adjust_ptr_min_max_vals(). Instead we have a register state that looks
      like this at line 11*,
      
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,
                         smin_value=0,smax_value=0,
                         umin_value=0,umax_value=0,
                         var_off=(0x0; 0x0))
          R1_w=invP(id=0,
                    smin_value=0,smax_value=4294967295,
                    umin_value=0,umax_value=4294967295,
                    var_off=(0xfffffffe; 0x0))
          R10=fp(id=0,off=0,
                 smin_value=0,smax_value=0,
                 umin_value=0,umax_value=0,
                 var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      
      In R1 'smin_val != smax_val' yet we have a tnum_const as seen
      by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check
      in adjust_ptr_min_max_vals()
      
       if ((known && (smin_val != smax_val || umin_val != umax_val)) ||
            smin_val > smax_val || umin_val > umax_val) {
             /* Taint dst register if offset had invalid bounds derived from
              * e.g. dead branches.
              */
             __mark_reg_unknown(env, dst_reg);
             return 0;
       }
      
      So we don't throw an error here and instead only throw an error
      later in the verification when the memory access is made.
      
      The root cause in verifier without alu32 bounds tracking is having
      'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set
      when 'umin_value < umax_val' here,
      
       if (dst_reg->umin_value < umax_val) {
          /* Overflow possible, we know nothing */
          dst_reg->umin_value = 0;
          dst_reg->umax_value = U64_MAX;
       } else { ...}
      
      Later in adjust_calar_min_max_vals we previously did a
      coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by
      truncating to 32bits. But either way without a call to update_reg_bounds
      the less precise bounds tracking will fall out of the alu op
      verification.
      
      After latest changes we now exit adjust_scalar_min_max_vals with the
      more precise umin value, due to zero extension propogating bounds from
      alu32 bounds into alu64 bounds and then calling update_reg_bounds.
      This then causes the verifier to trigger an earlier error and we get
      the error in the output above.
      
      This patch updates tests to reflect new error message.
      
      * I have a local patch to print entire verifier state regardless if we
       believe it is a constant so we can get a full picture of the state.
       Usually if tnum_is_const() then bounds are also smin=smax, etc. but
       this is not always true and is a bit subtle. Being able to see these
       states helps understand dataflow imo. Let me know if we want something
       similar upstream.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower
      aa131ed4
    • J
      bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds() · 294f2fc6
      John Fastabend 提交于
      Currently, for all op verification we call __red_deduce_bounds() and
      __red_bound_offset() but we only call __update_reg_bounds() in bitwise
      ops. However, we could benefit from calling __update_reg_bounds() in
      BPF_ADD, BPF_SUB, and BPF_MUL cases as well.
      
      For example, a register with state 'R1_w=invP0' when we subtract from
      it,
      
       w1 -= 2
      
      Before coerce we will now have an smin_value=S64_MIN, smax_value=U64_MAX
      and unsigned bounds umin_value=0, umax_value=U64_MAX. These will then
      be clamped to S32_MIN, U32_MAX values by coerce in the case of alu32 op
      as done in above example. However tnum will be a constant because the
      ALU op is done on a constant.
      
      Without update_reg_bounds() we have a scenario where tnum is a const
      but our unsigned bounds do not reflect this. By calling update_reg_bounds
      after coerce to 32bit we further refine the umin_value to U64_MAX in the
      alu64 case or U32_MAX in the alu32 case above.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507151689.15666.566796274289413203.stgit@john-Precision-5820-Tower
      294f2fc6
    • J
      bpf: Verifer, refactor adjust_scalar_min_max_vals · 07cd2631
      John Fastabend 提交于
      Pull per op ALU logic into individual functions. We are about to add
      u32 versions of each of these by pull them out the code gets a bit
      more readable here and nicer in the next patch.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507149518.15666.15672349629329072411.stgit@john-Precision-5820-Tower
      07cd2631
    • S
      libbpf: Don't allocate 16M for log buffer by default · 8395f320
      Stanislav Fomichev 提交于
      For each prog/btf load we allocate and free 16 megs of verifier buffer.
      On production systems it doesn't really make sense because the
      programs/btf have gone through extensive testing and (mostly) guaranteed
      to successfully load.
      
      Let's assume successful case by default and skip buffer allocation
      on the first try. If there is an error, start with BPF_LOG_BUF_SIZE
      and double it on each ENOSPC iteration.
      
      v3:
      * Return -ENOMEM when can't allocate log buffer (Andrii Nakryiko)
      
      v2:
      * Don't allocate the buffer at all on the first try (Andrii Nakryiko)
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200325195521.112210-1-sdf@google.com
      8395f320
    • T
      libbpf: Remove unused parameter `def` to get_map_field_int · 9fc9aad9
      Tobias Klauser 提交于
      Has been unused since commit ef99b02b ("libbpf: capture value in BTF
      type info for BTF-defined map defs").
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NQuentin Monnet <quentin@isovalent.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200325113655.19341-1-tklauser@distanz.ch
      9fc9aad9
    • A
      bpf: Document bpf_inspect drgn tool · 8c061865
      Andrey Ignatov 提交于
      It's a follow-up for discussion in [1].
      
      drgn tool bpf_inspect.py was merged to drgn repo in [2]. Document it
      in kernel tree to make BPF developers aware that the tool exists and
      can help with getting BPF state unavailable via UAPI.
      
      For now it's just one tool but the doc is written in a way that allows
      to cover more tools in the future if needed.
      
      Please refer to the doc itself for more details.
      
      The patch was tested by `make htmldocs` and sanity-checking that
      resulting html looks good.
      
      v2 -> v3:
        - two sections: "Description" and "Getting started" (Daniel);
        - add examples in "Getting started" section (Daniel);
        - add "Customization" section to show how tool can be customized.
      
      v1 -> v2:
        - better "BPF drgn tools" section (Alexei)
      
        [1] https://lore.kernel.org/bpf/20200228201514.GB51456@rdna-mbp/T/#mefed65e8a98116bd5d07d09a570a3eac46724951
        [2] https://github.com/osandov/drgn/pull/49Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200324185135.1431038-1-rdna@fb.com
      8c061865
  5. 24 3月, 2020 4 次提交
  6. 21 3月, 2020 1 次提交
  7. 20 3月, 2020 6 次提交
    • Y
      bpf, tcp: Make tcp_bpf_recvmsg static · c0fd336e
      YueHaibing 提交于
      After commit f747632b ("bpf: sockmap: Move generic sockmap
      hooks from BPF TCP"), tcp_bpf_recvmsg() is not used out of
      tcp_bpf.c, so make it static and remove it from tcp.h. Also move
      it to BPF_STREAM_PARSER #ifdef to fix unused function warnings.
      
      Fixes: f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200320023426.60684-3-yuehaibing@huawei.com
      c0fd336e
    • Y
      bpf, tcp: Fix unused function warnings · a2652798
      YueHaibing 提交于
      If BPF_STREAM_PARSER is not set, gcc warns:
      
        net/ipv4/tcp_bpf.c:483:12: warning: 'tcp_bpf_sendpage' defined but not used [-Wunused-function]
        net/ipv4/tcp_bpf.c:395:12: warning: 'tcp_bpf_sendmsg' defined but not used [-Wunused-function]
        net/ipv4/tcp_bpf.c:13:13: warning: 'tcp_bpf_stream_read' defined but not used [-Wunused-function]
      
      Moves the unused functions into the #ifdef CONFIG_BPF_STREAM_PARSER.
      
      Fixes: f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NLorenz Bauer <lmb@cloudflare.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200320023426.60684-2-yuehaibing@huawei.com
      a2652798
    • M
      bpftool: Add struct_ops support · 65c93628
      Martin KaFai Lau 提交于
      This patch adds struct_ops support to the bpftool.
      
      To recap a bit on the recent bpf_struct_ops feature on the kernel side:
      It currently supports "struct tcp_congestion_ops" to be implemented
      in bpf.  At a high level, bpf_struct_ops is struct_ops map populated
      with a number of bpf progs.  bpf_struct_ops currently supports the
      "struct tcp_congestion_ops".  However, the bpf_struct_ops design is
      generic enough that other kernel struct ops can be supported in
      the future.
      
      Although struct_ops is map+progs at a high lever, there are differences
      in details.  For example,
      1) After registering a struct_ops, the struct_ops is held by the kernel
         subsystem (e.g. tcp-cc).  Thus, there is no need to pin a
         struct_ops map or its progs in order to keep them around.
      2) To iterate all struct_ops in a system, it iterates all maps
         in type BPF_MAP_TYPE_STRUCT_OPS.  BPF_MAP_TYPE_STRUCT_OPS is
         the current usual filter.  In the future, it may need to
         filter by other struct_ops specific properties.  e.g. filter by
         tcp_congestion_ops or other kernel subsystem ops in the future.
      3) struct_ops requires the running kernel having BTF info.  That allows
         more flexibility in handling other kernel structs.  e.g. it can
         always dump the latest bpf_map_info.
      4) Also, "struct_ops" command is not intended to repeat all features
         already provided by "map" or "prog".  For example, if there really
         is a need to pin the struct_ops map, the user can use the "map" cmd
         to do that.
      
      While the first attempt was to reuse parts from map/prog.c,  it ended up
      not a lot to share.  The only obvious item is the map_parse_fds() but
      that still requires modifications to accommodate struct_ops map specific
      filtering (for the immediate and the future needs).  Together with the
      earlier mentioned differences, it is better to part away from map/prog.c.
      
      The initial set of subcmds are, register, unregister, show, and dump.
      
      For register, it registers all struct_ops maps that can be found in an
      obj file.  Option can be added in the future to specify a particular
      struct_ops map.  Also, the common bpf_tcp_cc is stateless (e.g.
      bpf_cubic.c and bpf_dctcp.c).  The "reuse map" feature is not
      implemented in this patch and it can be considered later also.
      
      For other subcmds, please see the man doc for details.
      
      A sample output of dump:
      [root@arch-fb-vm1 bpf]# bpftool struct_ops dump name cubic
      [{
              "bpf_map_info": {
                  "type": 26,
                  "id": 64,
                  "key_size": 4,
                  "value_size": 256,
                  "max_entries": 1,
                  "map_flags": 0,
                  "name": "cubic",
                  "ifindex": 0,
                  "btf_vmlinux_value_type_id": 18452,
                  "netns_dev": 0,
                  "netns_ino": 0,
                  "btf_id": 52,
                  "btf_key_type_id": 0,
                  "btf_value_type_id": 0
              }
          },{
              "bpf_struct_ops_tcp_congestion_ops": {
                  "refcnt": {
                      "refs": {
                          "counter": 1
                      }
                  },
                  "state": "BPF_STRUCT_OPS_STATE_INUSE",
                  "data": {
                      "list": {
                          "next": 0,
                          "prev": 0
                      },
                      "key": 0,
                      "flags": 0,
                      "init": "void (struct sock *) bictcp_init/prog_id:138",
                      "release": "void (struct sock *) 0",
                      "ssthresh": "u32 (struct sock *) bictcp_recalc_ssthresh/prog_id:141",
                      "cong_avoid": "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140",
                      "set_state": "void (struct sock *, u8) bictcp_state/prog_id:142",
                      "cwnd_event": "void (struct sock *, enum tcp_ca_event) bictcp_cwnd_event/prog_id:139",
                      "in_ack_event": "void (struct sock *, u32) 0",
                      "undo_cwnd": "u32 (struct sock *) tcp_reno_undo_cwnd/prog_id:144",
                      "pkts_acked": "void (struct sock *, const struct ack_sample *) bictcp_acked/prog_id:143",
                      "min_tso_segs": "u32 (struct sock *) 0",
                      "sndbuf_expand": "u32 (struct sock *) 0",
                      "cong_control": "void (struct sock *, const struct rate_sample *) 0",
                      "get_info": "size_t (struct sock *, u32, int *, union tcp_cc_info *) 0",
                      "name": "bpf_cubic",
                      "owner": 0
                  }
              }
          }
      ]
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171656.129650-1-kafai@fb.com
      65c93628
    • M
      bpftool: Translate prog_id to its bpf prog_name · d5ae04da
      Martin KaFai Lau 提交于
      The kernel struct_ops obj has kernel's func ptrs implemented by bpf_progs.
      The bpf prog_id is stored as the value of the func ptr for introspection
      purpose.  In the latter patch, a struct_ops dump subcmd will be added
      to introspect these func ptrs.  It is desired to print the actual bpf
      prog_name instead of only printing the prog_id.
      
      Since struct_ops is the only usecase storing prog_id in the func ptr,
      this patch adds a prog_id_as_func_ptr bool (default is false) to
      "struct btf_dumper" in order not to mis-interpret the ptr value
      for the other existing use-cases.
      
      While printing a func_ptr as a bpf prog_name,
      this patch also prefix the bpf prog_name with the ptr's func_proto.
      [ Note that it is the ptr's func_proto instead of the bpf prog's
        func_proto ]
      It reuses the current btf_dump_func() to obtain the ptr's func_proto
      string.
      
      Here is an example from the bpf_cubic.c:
      "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140"
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171650.129252-1-kafai@fb.com
      d5ae04da
    • M
      bpftool: Print as a string for char array · 30255d31
      Martin KaFai Lau 提交于
      A char[] is currently printed as an integer array.
      This patch will print it as a string when
      1) The array element type is an one byte int
      2) The array element type has a BTF_INT_CHAR encoding or
         the array element type's name is "char"
      3) All characters is between (0x1f, 0x7f) and it is terminated
         by a null character.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171643.129021-1-kafai@fb.com
      30255d31
    • M
      bpftool: Print the enum's name instead of value · ca7e6e45
      Martin KaFai Lau 提交于
      This patch prints the enum's name if there is one found in
      the array of btf_enum.
      
      The commit 9eea9849 ("bpf: fix BTF verification of enums")
      has details about an enum could have any power-of-2 size (up to 8 bytes).
      This patch also takes this chance to accommodate these non 4 byte
      enums.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171637.128862-1-kafai@fb.com
      ca7e6e45
  8. 19 3月, 2020 1 次提交
  9. 18 3月, 2020 6 次提交
  10. 17 3月, 2020 1 次提交
    • J
      sfc: fix XDP-redirect in this driver · 86e85bf6
      Jesper Dangaard Brouer 提交于
      XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
      tailroom for skb_shared_info when creating an SKB based on the
      redirected xdp_frame (both in cpumap and veth).
      
      The fix requires some initial explaining. The driver uses RX page-split
      when possible. It reserves the top 64 bytes in the RX-page for storing
      dma_addr (struct efx_rx_page_state). It also have the XDP recommended
      headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
      tailroom, it can still fit two standard MTU (1500) frames into one page.
      
      The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
      and i40e, reduce their XDP headroom to 192 bytes, which allows them to
      fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).
      
      The fix is to reduce this drivers headroom to 128 bytes and add the 320
      bytes tailroom. This account for reserved top 64 bytes in the page, and
      still fit two frame in a page for normal MTUs.
      
      We must never go below 128 bytes of headroom for XDP, as one cacheline
      is for xdp_frame area and next cacheline is reserved for metadata area.
      
      Fixes: eb9a36be ("sfc: perform XDP processing on received packets")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86e85bf6