1. 14 8月, 2020 3 次提交
    • J
      bpf: sock_ops sk access may stomp registers when dst_reg = src_reg · 84f44df6
      John Fastabend 提交于
      Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
      src_reg = dst_reg when reading the sk field of a sock_ops struct we
      generate xlated code,
      
        53: (61) r9 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        56: (79) r9 = *(u64 *)(r9 +0)
      
      This stomps on the r9 reg to do the sk_fullsock check and then when
      reading the skops->sk field instead of the sk pointer we get the
      sk_fullsock. To fix use similar pattern noted in the previous fix
      and use the temp field to save/restore a register used to do
      sk_fullsock check.
      
      After the fix the generated xlated code reads,
      
        52: (7b) *(u64 *)(r9 +32) = r8
        53: (61) r8 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        55: (79) r8 = *(u64 *)(r9 +32)
        56: (79) r9 = *(u64 *)(r9 +0)
        57: (05) goto pc+1
        58: (79) r8 = *(u64 *)(r9 +32)
      
      Here r9 register was in-use so r8 is chosen as the temporary register.
      In line 52 r8 is saved in temp variable and at line 54 restored in case
      fullsock != 0. Finally we handle fullsock == 0 case by restoring at
      line 58.
      
      This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
      this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
      bit more confusing than just adding a new macro despite a bit of
      duplicating code.
      
      Fixes: 1314ef56 ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
      84f44df6
    • J
      bpf: sock_ops ctx access may stomp registers in corner case · fd09af01
      John Fastabend 提交于
      I had a sockmap program that after doing some refactoring started spewing
      this splat at me:
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      [...]
      [18610.807359] Call Trace:
      [18610.807370]  ? 0xffffffffc114d0d5
      [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
      [18610.807391]  tcp_connect+0x895/0xd50
      [18610.807400]  tcp_v4_connect+0x465/0x4e0
      [18610.807407]  __inet_stream_connect+0xd6/0x3a0
      [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
      [18610.807417]  inet_stream_connect+0x3b/0x60
      [18610.807425]  __sys_connect+0xed/0x120
      
      After some debugging I was able to build this simple reproducer,
      
       __section("sockops/reproducer_bad")
       int bpf_reproducer_bad(struct bpf_sock_ops *skops)
       {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              return 0;
       }
      
      And along the way noticed that below program ran without splat,
      
      __section("sockops/reproducer_good")
      int bpf_reproducer_good(struct bpf_sock_ops *skops)
      {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              volatile __maybe_unused __u32 family;
      
              compiler_barrier();
      
              family = skops->family;
              return 0;
      }
      
      So I decided to check out the code we generate for the above two
      programs and noticed each generates the BPF code you would expect,
      
      0000000000000000 <bpf_reproducer_bad>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r1 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r1
      ;       return 0;
             2:       r0 = 0
             3:       exit
      
      0000000000000000 <bpf_reproducer_good>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r2 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r2
      ;       family = skops->family;
             2:       r1 = *(u32 *)(r1 + 20)
             3:       *(u32 *)(r10 - 8) = r1
      ;       return 0;
             4:       r0 = 0
             5:       exit
      
      So we get reasonable assembly, but still something was causing the null
      pointer dereference. So, we load the programs and dump the xlated version
      observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
      translated by the skops access helpers.
      
      int bpf_reproducer_bad(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
         3: (61) r1 = *(u32 *)(r1 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r1
      ; return 0;
         5: (b7) r0 = 0
         6: (95) exit
      
      int bpf_reproducer_good(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r2 = *(u32 *)(r1 +28)
         1: (15) if r2 == 0x0 goto pc+2
         2: (79) r2 = *(u64 *)(r1 +0)
         3: (61) r2 = *(u32 *)(r2 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r2
      ; family = skops->family;
         5: (79) r1 = *(u64 *)(r1 +0)
         6: (69) r1 = *(u16 *)(r1 +16)
      ; family = skops->family;
         7: (63) *(u32 *)(r10 -8) = r1
      ; return 0;
         8: (b7) r0 = 0
         9: (95) exit
      
      Then we look at lines 0 and 2 above. In the good case we do the zero
      check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
      into the bpf_sock_ops check and we can confirm that is the 'struct
      sock *sk' pointer field. But, in the bad case,
      
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
      
      Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
      line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      
      The 0x01 makes sense because that is exactly the fullsock value. And
      its not a valid dereference so we splat.
      
      To fix we need to guard the case when a program is doing a sock_ops field
      access with src_reg == dst_reg. This is already handled in the load case
      where the ctx_access handler uses a tmp register being careful to
      store the old value and restore it. To fix the get case test if
      src_reg == dst_reg and in this case do the is_fullsock test in the
      temporary register. Remembering to restore the temporary register before
      writing to either dst_reg or src_reg to avoid smashing the pointer into
      the struct holding the tmp variable.
      
      Adding this inline code to test_tcpbpf_kern will now be generated
      correctly from,
      
        9: r2 = *(u32 *)(r2 + 96)
      
      to xlated code,
      
        12: (7b) *(u64 *)(r2 +32) = r9
        13: (61) r9 = *(u32 *)(r2 +28)
        14: (15) if r9 == 0x0 goto pc+4
        15: (79) r9 = *(u64 *)(r2 +32)
        16: (79) r2 = *(u64 *)(r2 +0)
        17: (61) r2 = *(u32 *)(r2 +2348)
        18: (05) goto pc+1
        19: (79) r9 = *(u64 *)(r2 +32)
      
      And in the normal case we keep the original code, because really this
      is an edge case. From this,
      
        9: r2 = *(u32 *)(r6 + 96)
      
      to xlated code,
      
        22: (61) r2 = *(u32 *)(r6 +28)
        23: (15) if r2 == 0x0 goto pc+2
        24: (79) r2 = *(u64 *)(r6 +0)
        25: (61) r2 = *(u32 *)(r2 +2348)
      
      So three additional instructions if dst == src register, but I scanned
      my current code base and did not see this pattern anywhere so should
      not be a big deal. Further, it seems no one else has hit this or at
      least reported it so it must a fairly rare pattern.
      
      Fixes: 9b1f3d6e ("bpf: Refactor sock_ops_convert_ctx_access")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
      fd09af01
    • T
      libbpf: Prevent overriding errno when logging errors · 23ab656b
      Toke Høiland-Jørgensen 提交于
      Turns out there were a few more instances where libbpf didn't save the
      errno before writing an error message, causing errno to be overridden by
      the printf() return and the error disappearing if logging is enabled.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200813142905.160381-1-toke@redhat.com
      23ab656b
  2. 13 8月, 2020 4 次提交
  3. 11 8月, 2020 8 次提交
  4. 09 8月, 2020 6 次提交
  5. 08 8月, 2020 9 次提交
  6. 07 8月, 2020 10 次提交
    • X
      drivers/net/wan/lapbether: Added needed_headroom and a skb->len check · c7ca03c2
      Xie He 提交于
      1. Added a skb->len check
      
      This driver expects upper layers to include a pseudo header of 1 byte
      when passing down a skb for transmission. This driver will read this
      1-byte header. This patch added a skb->len check before reading the
      header to make sure the header exists.
      
      2. Changed to use needed_headroom instead of hard_header_len to request
      necessary headroom to be allocated
      
      In net/packet/af_packet.c, the function packet_snd first reserves a
      headroom of length (dev->hard_header_len + dev->needed_headroom).
      Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
      which calls dev->header_ops->create, to create the link layer header.
      If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
      length (dev->hard_header_len), and assumes the user to provide the
      appropriate link layer header.
      
      So according to the logic of af_packet.c, dev->hard_header_len should
      be the length of the header that would be created by
      dev->header_ops->create.
      
      However, this driver doesn't provide dev->header_ops, so logically
      dev->hard_header_len should be 0.
      
      So we should use dev->needed_headroom instead of dev->hard_header_len
      to request necessary headroom to be allocated.
      
      This change fixes kernel panic when this driver is used with AF_PACKET
      SOCK_RAW sockets.
      
      Call stack when panic:
      
      [  168.399197] skbuff: skb_under_panic: text:ffffffff819d95fb len:20
      put:14 head:ffff8882704c0a00 data:ffff8882704c09fd tail:0x11 end:0xc0
      dev:veth0
      ...
      [  168.399255] Call Trace:
      [  168.399259]  skb_push.cold+0x14/0x24
      [  168.399262]  eth_header+0x2b/0xc0
      [  168.399267]  lapbeth_data_transmit+0x9a/0xb0 [lapbether]
      [  168.399275]  lapb_data_transmit+0x22/0x2c [lapb]
      [  168.399277]  lapb_transmit_buffer+0x71/0xb0 [lapb]
      [  168.399279]  lapb_kick+0xe3/0x1c0 [lapb]
      [  168.399281]  lapb_data_request+0x76/0xc0 [lapb]
      [  168.399283]  lapbeth_xmit+0x56/0x90 [lapbether]
      [  168.399286]  dev_hard_start_xmit+0x91/0x1f0
      [  168.399289]  ? irq_init_percpu_irqstack+0xc0/0x100
      [  168.399291]  __dev_queue_xmit+0x721/0x8e0
      [  168.399295]  ? packet_parse_headers.isra.0+0xd2/0x110
      [  168.399297]  dev_queue_xmit+0x10/0x20
      [  168.399298]  packet_sendmsg+0xbf0/0x19b0
      ......
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Cc: Brian Norris <briannorris@chromium.org>
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7ca03c2
    • J
      bpf: Fix compilation warning of selftests · 929e54a9
      Jianlin Lv 提交于
      Clang compiler version: 12.0.0
      The following warning appears during the selftests/bpf compilation:
      
      prog_tests/send_signal.c:51:3: warning: ignoring return value of ‘write’,
      declared with attribute warn_unused_result [-Wunused-result]
         51 |   write(pipe_c2p[1], buf, 1);
            |   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      prog_tests/send_signal.c:54:3: warning: ignoring return value of ‘read’,
      declared with attribute warn_unused_result [-Wunused-result]
         54 |   read(pipe_p2c[0], buf, 1);
            |   ^~~~~~~~~~~~~~~~~~~~~~~~~
      ......
      
      prog_tests/stacktrace_build_id_nmi.c:13:2: warning: ignoring return value
      of ‘fscanf’,declared with attribute warn_unused_result [-Wunused-resul]
         13 |  fscanf(f, "%llu", &sample_freq);
            |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      test_tcpnotify_user.c:133:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        133 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      test_tcpnotify_user.c:138:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        138 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      test_tcpnotify_user.c:143:2: warning:ignoring return value of ‘system’,
      declared with attribute warn_unused_result [-Wunused-result]
        143 |  system(test_script);
            |  ^~~~~~~~~~~~~~~~~~~
      
      Add code that fix compilation warning about ignoring return value and
      handles any errors; Check return value of library`s API make the code
      more secure.
      Signed-off-by: NJianlin Lv <Jianlin.Lv@arm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806104224.95306-1-Jianlin.Lv@arm.com
      929e54a9
    • J
      selftests: bpf: Switch off timeout · 6fc5916c
      Jiri Benc 提交于
      Several bpf tests are interrupted by the default timeout of 45 seconds added
      by commit 852c8cbf ("selftests/kselftest/runner.sh: Add 45 second
      timeout per test"). In my case it was test_progs, test_tunnel.sh,
      test_lwt_ip_encap.sh and test_xdping.sh.
      
      There's not much value in having a timeout for bpf tests, switch it off.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/7a9198ed10917f4ecab4a3dd74bcda1200791efd.1596739059.git.jbenc@redhat.com
      6fc5916c
    • S
      bpf: Remove inline from bpf_do_trace_printk · 0d360d64
      Stanislav Fomichev 提交于
      I get the following error during compilation on my side:
      kernel/trace/bpf_trace.c: In function 'bpf_do_trace_printk':
      kernel/trace/bpf_trace.c:386:34: error: function 'bpf_do_trace_printk' can never be inlined because it uses variable argument lists
       static inline __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
                                        ^
      
      Fixes: ac5a72ea ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806182612.1390883-1-sdf@google.com
      0d360d64
    • S
      bpf: Add missing return to resolve_btfids · d48556f4
      Stanislav Fomichev 提交于
      int sets_patch(struct object *obj) doesn't have a 'return 0' at the end.
      
      Fixes: fbbb68de ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200806155225.637202-1-sdf@google.com
      d48556f4
    • D
      libbf: Fix uninitialized pointer at btf__parse_raw() · 932ac54a
      Daniel T. Lee 提交于
      Recently, from commit 94a1fedd ("libbpf: Add btf__parse_raw() and
      generic btf__parse() APIs"), new API has been added to libbpf that
      allows to parse BTF from raw data file (btf__parse_raw()).
      
      The commit derives build failure of samples/bpf due to improper access
      of uninitialized pointer at btf_parse_raw().
      
          btf.c: In function btf__parse_raw:
          btf.c:625:28: error: btf may be used uninitialized in this function
            625 |  return err ? ERR_PTR(err) : btf;
                |         ~~~~~~~~~~~~~~~~~~~^~~~~
      
      This commit fixes the build failure of samples/bpf by adding code of
      initializing btf pointer as NULL.
      
      Fixes: 94a1fedd ("libbpf: Add btf__parse_raw() and generic btf__parse() APIs")
      Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805223359.32109-1-danieltimlee@gmail.com
      932ac54a
    • A
      Merge branch 'bpf_iter-uapi-fix' · 0ac10dc1
      Alexei Starovoitov 提交于
      Yonghong Song says:
      
      ====================
      Andrii raised a concern that current uapi for bpf iterator map
      element is a little restrictive and not suitable for future potential
      complex customization. This is a valid suggestion, considering people
      may indeed add more complex custimization to the iterator, e.g.,
      cgroup_id + user_id, etc. for task or task_file. Another example might
      be map_id plus additional control so that the bpf iterator may bail
      out a bucket earlier if a bucket has too many elements which may hold
      lock too long and impact other parts of systems.
      
      Patch #1 modified uapi with kernel changes. Patch #2
      adjusted libbpf api accordingly.
      
      Changelogs:
        v3 -> v4:
          . add a forward declaration of bpf_iter_link_info in
            tools/lib/bpf/bpf.h in case that libbpf is built against
            not-latest uapi bpf.h.
          . target the patch set to "bpf" instead of "bpf-next"
        v2 -> v3:
          . undo "not reject iter_info.map.map_fd == 0" from v1.
            In the future map_fd may become optional, so let us use map_fd == 0
            indicating the map_fd is not set by user space.
          . add link_info_len to bpf_iter_attach_opts to ensure always correct
            link_info_len from user. Otherwise, libbpf may deduce incorrect
            link_info_len if it uses different uapi header than the user app.
        v1 -> v2:
          . ensure link_create target_fd/flags == 0 since they are not used. (Andrii)
          . if either of iter_info ptr == 0 or iter_info_len == 0, but not both,
            return error to user space. (Andrii)
          . do not reject iter_info.map.map_fd == 0, go ahead to use it trying to
            get a map reference since the map_fd is required for map_elem iterator.
          . use bpf_iter_link_info in bpf_iter_attach_opts instead of map_fd.
            this way, user space is responsible to set up bpf_iter_link_info and
            libbpf just passes the data to the kernel, simplifying libbpf design.
            (Andrii)
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0ac10dc1
    • Y
      tools/bpf: Support new uapi for map element bpf iterator · 74fc097d
      Yonghong Song 提交于
      Previous commit adjusted kernel uapi for map
      element bpf iterator. This patch adjusted libbpf API
      due to uapi change. bpftool and bpf_iter selftests
      are also changed accordingly.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805055058.1457623-1-yhs@fb.com
      74fc097d
    • Y
      bpf: Change uapi for bpf iterator map elements · 5e7b3020
      Yonghong Song 提交于
      Commit a5cbe05a ("bpf: Implement bpf iterator for
      map elements") added bpf iterator support for
      map elements. The map element bpf iterator requires
      info to identify a particular map. In the above
      commit, the attr->link_create.target_fd is used
      to carry map_fd and an enum bpf_iter_link_info
      is added to uapi to specify the target_fd actually
      representing a map_fd:
          enum bpf_iter_link_info {
      	BPF_ITER_LINK_UNSPEC = 0,
      	BPF_ITER_LINK_MAP_FD = 1,
      
      	MAX_BPF_ITER_LINK_INFO,
          };
      
      This is an extensible approach as we can grow
      enumerator for pid, cgroup_id, etc. and we can
      unionize target_fd for pid, cgroup_id, etc.
      But in the future, there are chances that
      more complex customization may happen, e.g.,
      for tasks, it could be filtered based on
      both cgroup_id and user_id.
      
      This patch changed the uapi to have fields
      	__aligned_u64	iter_info;
      	__u32		iter_info_len;
      for additional iter_info for link_create.
      The iter_info is defined as
      	union bpf_iter_link_info {
      		struct {
      			__u32   map_fd;
      		} map;
      	};
      
      So future extension for additional customization
      will be easier. The bpf_iter_link_info will be
      passed to target callback to validate and generic
      bpf_iter framework does not need to deal it any
      more.
      
      Note that map_fd = 0 will be considered invalid
      and -EBADF will be returned to user space.
      
      Fixes: a5cbe05a ("bpf: Implement bpf iterator for map elements")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
      5e7b3020
    • A
      selftests/bpf: Prevent runqslower from racing on building bpftool · 6bcaf41f
      Andrii Nakryiko 提交于
      runqslower's Makefile is building/installing bpftool into
      $(OUTPUT)/sbin/bpftool, which coincides with $(DEFAULT_BPFTOOL). In practice
      this means that often when building selftests from scratch (after `make
      clean`), selftests are racing with runqslower to simultaneously build bpftool
      and one of the two processes fail due to file being busy. Prevent this race by
      explicitly order-depending on $(BPFTOOL_DEFAULT).
      
      Fixes: a2c9652f ("selftests: Refactor build to remove tools/lib/bpf from include path")
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805004757.2960750-1-andriin@fb.com
      6bcaf41f