1. 27 7月, 2017 1 次提交
    • J
      bpf: don't zero out the info struct in bpf_obj_get_info_by_fd() · d777b2dd
      Jakub Kicinski 提交于
      The buffer passed to bpf_obj_get_info_by_fd() should be initialized
      to zeros.  Kernel will enforce that to guarantee we can safely extend
      info structures in the future.
      
      Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
      is problematic, however, since some members of the info structures
      may need to be initialized by the callers (for instance pointers
      to buffers to which kernel is to dump translated and jited images).
      
      Remove the zeroing and fix up the in-tree callers before any kernel
      has been released with this code.
      
      As Daniel points out this seems to be the intended operation anyway,
      since commit 95b9afd3 ("bpf: Test for bpf ID") is itself setting
      the buffer pointers before calling bpf_obj_get_info_by_fd().
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d777b2dd
  2. 25 7月, 2017 1 次提交
  3. 21 7月, 2017 4 次提交
  4. 12 7月, 2017 1 次提交
    • Y
      samples/bpf: fix a build issue · 53335022
      Yonghong Song 提交于
      With latest net-next:
      
      ====
      clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
          -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
          -Wno-compare-distinct-pointer-types \
          -Wno-gnu-variable-sized-type-not-at-end \
          -Wno-address-of-packed-member -Wno-tautological-compare \
          -Wno-unknown-warning-option \
          -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
      samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
                ^~~~~~~~~~~~~~
      1 error generated.
      ====
      
      net has the same issue.
      
      Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
      Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
      compiler include logic so that programs in samples/bpf can access the headers
      in selftests/bpf, but not the other way around.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53335022
  5. 03 7月, 2017 1 次提交
  6. 30 6月, 2017 1 次提交
    • D
      bpf: prevent leaking pointer via xadd on unpriviledged · 6bdf6abc
      Daniel Borkmann 提交于
      Leaking kernel addresses on unpriviledged is generally disallowed,
      for example, verifier rejects the following:
      
        0: (b7) r0 = 0
        1: (18) r2 = 0xffff897e82304400
        3: (7b) *(u64 *)(r1 +48) = r2
        R2 leaks addr into ctx
      
      Doing pointer arithmetic on them is also forbidden, so that they
      don't turn into unknown value and then get leaked out. However,
      there's xadd as a special case, where we don't check the src reg
      for being a pointer register, e.g. the following will pass:
      
        0: (b7) r0 = 0
        1: (7b) *(u64 *)(r1 +48) = r0
        2: (18) r2 = 0xffff897e82304400 ; map
        4: (db) lock *(u64 *)(r1 +48) += r2
        5: (95) exit
      
      We could store the pointer into skb->cb, loose the type context,
      and then read it out from there again to leak it eventually out
      of a map value. Or more easily in a different variant, too:
      
         0: (bf) r6 = r1
         1: (7a) *(u64 *)(r10 -8) = 0
         2: (bf) r2 = r10
         3: (07) r2 += -8
         4: (18) r1 = 0x0
         6: (85) call bpf_map_lookup_elem#1
         7: (15) if r0 == 0x0 goto pc+3
         R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
         8: (b7) r3 = 0
         9: (7b) *(u64 *)(r0 +0) = r3
        10: (db) lock *(u64 *)(r0 +0) += r6
        11: (b7) r0 = 0
        12: (95) exit
      
        from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
        11: (b7) r0 = 0
        12: (95) exit
      
      Prevent this by checking xadd src reg for pointer types. Also
      add a couple of test cases related to this.
      
      Fixes: 1be7f75d ("bpf: enable non-root eBPF programs")
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdf6abc
  7. 15 6月, 2017 2 次提交
    • Y
      selftests/bpf: Add test cases to test narrower ctx field loads · 18f3d6be
      Yonghong Song 提交于
      Add test cases in test_verifier and test_progs.
      Negative tests are added in test_verifier as well.
      The test in test_progs will compare the value of narrower ctx field
      load result vs. the masked value of normal full-field load result,
      and will fail if they are not the same.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18f3d6be
    • Y
      bpf: permits narrower load from bpf program context fields · 31fd8581
      Yonghong Song 提交于
      Currently, verifier will reject a program if it contains an
      narrower load from the bpf context structure. For example,
              __u8 h = __sk_buff->hash, or
              __u16 p = __sk_buff->protocol
              __u32 sample_period = bpf_perf_event_data->sample_period
      which are narrower loads of 4-byte or 8-byte field.
      
      This patch solves the issue by:
        . Introduce a new parameter ctx_field_size to carry the
          field size of narrower load from prog type
          specific *__is_valid_access validator back to verifier.
        . The non-zero ctx_field_size for a memory access indicates
          (1). underlying prog type specific convert_ctx_accesses
               supporting non-whole-field access
          (2). the current insn is a narrower or whole field access.
        . In verifier, for such loads where load memory size is
          less than ctx_field_size, verifier transforms it
          to a full field load followed by proper masking.
        . Currently, __sk_buff and bpf_perf_event_data->sample_period
          are supporting narrowing loads.
        . Narrower stores are still not allowed as typical ctx stores
          are just normal stores.
      
      Because of this change, some tests in verifier will fail and
      these tests are removed. As a bonus, rename some out of bound
      __sk_buff->cb access to proper field name and remove two
      redundant "skb cb oob" tests.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fd8581
  8. 14 6月, 2017 1 次提交
  9. 11 6月, 2017 2 次提交
  10. 10 6月, 2017 2 次提交
  11. 09 6月, 2017 1 次提交
    • D
      bpf, tests: fix endianness selection · 78a5a93c
      Daniel Borkmann 提交于
      I noticed that test_l4lb was failing in selftests:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 77 nsec
        test_pkt_access:PASS:ipv6 44 nsec
        test_xdp:PASS:ipv4 2933 nsec
        test_xdp:PASS:ipv6 1500 nsec
        test_l4lb:PASS:ipv4 377 nsec
        test_l4lb:PASS:ipv6 544 nsec
        test_l4lb:FAIL:stats 6297600000 200000
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 1 FAILED
      
      Tracking down the issue actually revealed that endianness selection
      in bpf_endian.h is broken when compiled with clang with bpf target.
      test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
      __BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
      fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
      and bpf_ntohs(iph->tot_len), and compares them against a defined
      value and given a wrong endianness, the test outcome is different,
      of course.
      
      Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
      comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
      include order we see different outcomes. Reason is that __BYTE_ORDER
      is undefined due to missing endian.h include. Before we include the
      asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
      __LITTLE_ENDIAN since both are undefined, after the include which
      correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
      is defined, but given __BYTE_ORDER is still undefined, we match on
      __BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
      undefined at that point, sigh. ii) But even that would be wrong,
      since when compiling the test cases with clang, one can select between
      bpfeb and bpfel targets for cross compilation. Hence, we can also not
      rely on what the system's endian.h provides, but we need to look at
      the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
      and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
      which also reflects targets bpf (native), bpfel, bpfeb correctly,
      thus really only rely on that. After patch:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 74 nsec
        test_pkt_access:PASS:ipv6 42 nsec
        test_xdp:PASS:ipv4 2340 nsec
        test_xdp:PASS:ipv6 1461 nsec
        test_l4lb:PASS:ipv4 400 nsec
        test_l4lb:PASS:ipv6 530 nsec
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 0 FAILED
      
      Fixes: 43bcf707 ("bpf: fix _htons occurences in test_progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a5a93c
  12. 07 6月, 2017 1 次提交
  13. 26 5月, 2017 1 次提交
  14. 18 5月, 2017 1 次提交
    • Y
      selftests/bpf: fix broken build due to types.h · 579f1d92
      Yonghong Song 提交于
      Commit 0a5539f6 ("bpf: Provide a linux/types.h override
      for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
      because of some missing types:
          $ make -C tools/testing/selftests/bpf/
          ...
          In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
          ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
                          __aligned_u64   key;
          ...
          /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
          static __always_inline __u16 __swab16p(const __u16 *p)
          ...
      The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
      
      The fix is to copy missing type definition into
      tools/testing/selftests/bpf/include/uapi/linux/types.h.
      Adding additional include "string.h" resolves __always_inline issue.
      
      Fixes: 0a5539f6 ("bpf: Provide a linux/types.h override for bpf selftests.")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      579f1d92
  15. 12 5月, 2017 4 次提交
  16. 03 5月, 2017 2 次提交
  17. 02 5月, 2017 3 次提交
  18. 01 5月, 2017 1 次提交
  19. 29 4月, 2017 3 次提交
  20. 25 4月, 2017 1 次提交
  21. 22 4月, 2017 2 次提交
    • D
      bpf: Fix values type used in test_maps · 89087c45
      David Miller 提交于
      Maps of per-cpu type have their value element size adjusted to 8 if it
      is specified smaller during various map operations.
      
      This makes test_maps as a 32-bit binary fail, in fact the kernel
      writes past the end of the value's array on the user's stack.
      
      To be quite honest, I think the kernel should reject creation of a
      per-cpu map that doesn't have a value size of at least 8 if that's
      what the kernel is going to silently adjust to later.
      
      If the user passed something smaller, it is a sizeof() calcualtion
      based upon the type they will actually use (just like in this testcase
      code) in later calls to the map operations.
      
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      89087c45
    • D
      bpf: add napi_id read access to __sk_buff · b1d9fc41
      Daniel Borkmann 提交于
      Add napi_id access to __sk_buff for socket filter program types, tc
      program types and other bpf_convert_ctx_access() users. Having access
      to skb->napi_id is useful for per RX queue listener siloing, f.e.
      in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
      used, meaning SO_REUSEPORT enabled listeners can then select the
      corresponding socket at SYN time already [1]. The skb is marked via
      skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
      
      Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d433902
      ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
      the NAPI ID associated with the queue for steering, which requires a
      prior sk_mark_napi_id() after the socket was looked up.
      
      Semantics for the __sk_buff napi_id access are similar, meaning if
      skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
      then an invalid napi_id of 0 is returned to the program, otherwise a
      valid non-zero napi_id.
      
        [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdfSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d9fc41
  22. 18 4月, 2017 3 次提交
  23. 07 4月, 2017 1 次提交