1. 30 6月, 2017 1 次提交
    • D
      bpf: prevent leaking pointer via xadd on unpriviledged · 6bdf6abc
      Daniel Borkmann 提交于
      Leaking kernel addresses on unpriviledged is generally disallowed,
      for example, verifier rejects the following:
      
        0: (b7) r0 = 0
        1: (18) r2 = 0xffff897e82304400
        3: (7b) *(u64 *)(r1 +48) = r2
        R2 leaks addr into ctx
      
      Doing pointer arithmetic on them is also forbidden, so that they
      don't turn into unknown value and then get leaked out. However,
      there's xadd as a special case, where we don't check the src reg
      for being a pointer register, e.g. the following will pass:
      
        0: (b7) r0 = 0
        1: (7b) *(u64 *)(r1 +48) = r0
        2: (18) r2 = 0xffff897e82304400 ; map
        4: (db) lock *(u64 *)(r1 +48) += r2
        5: (95) exit
      
      We could store the pointer into skb->cb, loose the type context,
      and then read it out from there again to leak it eventually out
      of a map value. Or more easily in a different variant, too:
      
         0: (bf) r6 = r1
         1: (7a) *(u64 *)(r10 -8) = 0
         2: (bf) r2 = r10
         3: (07) r2 += -8
         4: (18) r1 = 0x0
         6: (85) call bpf_map_lookup_elem#1
         7: (15) if r0 == 0x0 goto pc+3
         R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
         8: (b7) r3 = 0
         9: (7b) *(u64 *)(r0 +0) = r3
        10: (db) lock *(u64 *)(r0 +0) += r6
        11: (b7) r0 = 0
        12: (95) exit
      
        from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
        11: (b7) r0 = 0
        12: (95) exit
      
      Prevent this by checking xadd src reg for pointer types. Also
      add a couple of test cases related to this.
      
      Fixes: 1be7f75d ("bpf: enable non-root eBPF programs")
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdf6abc
  2. 09 6月, 2017 1 次提交
    • D
      bpf, tests: fix endianness selection · 78a5a93c
      Daniel Borkmann 提交于
      I noticed that test_l4lb was failing in selftests:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 77 nsec
        test_pkt_access:PASS:ipv6 44 nsec
        test_xdp:PASS:ipv4 2933 nsec
        test_xdp:PASS:ipv6 1500 nsec
        test_l4lb:PASS:ipv4 377 nsec
        test_l4lb:PASS:ipv6 544 nsec
        test_l4lb:FAIL:stats 6297600000 200000
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 1 FAILED
      
      Tracking down the issue actually revealed that endianness selection
      in bpf_endian.h is broken when compiled with clang with bpf target.
      test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
      __BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
      fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
      and bpf_ntohs(iph->tot_len), and compares them against a defined
      value and given a wrong endianness, the test outcome is different,
      of course.
      
      Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
      comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
      include order we see different outcomes. Reason is that __BYTE_ORDER
      is undefined due to missing endian.h include. Before we include the
      asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
      __LITTLE_ENDIAN since both are undefined, after the include which
      correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
      is defined, but given __BYTE_ORDER is still undefined, we match on
      __BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
      undefined at that point, sigh. ii) But even that would be wrong,
      since when compiling the test cases with clang, one can select between
      bpfeb and bpfel targets for cross compilation. Hence, we can also not
      rely on what the system's endian.h provides, but we need to look at
      the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
      and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
      which also reflects targets bpf (native), bpfel, bpfeb correctly,
      thus really only rely on that. After patch:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 74 nsec
        test_pkt_access:PASS:ipv6 42 nsec
        test_xdp:PASS:ipv4 2340 nsec
        test_xdp:PASS:ipv6 1461 nsec
        test_l4lb:PASS:ipv4 400 nsec
        test_l4lb:PASS:ipv6 530 nsec
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 0 FAILED
      
      Fixes: 43bcf707 ("bpf: fix _htons occurences in test_progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a5a93c
  3. 26 5月, 2017 1 次提交
  4. 18 5月, 2017 1 次提交
    • Y
      selftests/bpf: fix broken build due to types.h · 579f1d92
      Yonghong Song 提交于
      Commit 0a5539f6 ("bpf: Provide a linux/types.h override
      for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
      because of some missing types:
          $ make -C tools/testing/selftests/bpf/
          ...
          In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
          ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
                          __aligned_u64   key;
          ...
          /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
          static __always_inline __u16 __swab16p(const __u16 *p)
          ...
      The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
      
      The fix is to copy missing type definition into
      tools/testing/selftests/bpf/include/uapi/linux/types.h.
      Adding additional include "string.h" resolves __always_inline issue.
      
      Fixes: 0a5539f6 ("bpf: Provide a linux/types.h override for bpf selftests.")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      579f1d92
  5. 12 5月, 2017 4 次提交
  6. 03 5月, 2017 2 次提交
  7. 02 5月, 2017 3 次提交
  8. 01 5月, 2017 1 次提交
  9. 29 4月, 2017 3 次提交
  10. 25 4月, 2017 1 次提交
  11. 22 4月, 2017 2 次提交
    • D
      bpf: Fix values type used in test_maps · 89087c45
      David Miller 提交于
      Maps of per-cpu type have their value element size adjusted to 8 if it
      is specified smaller during various map operations.
      
      This makes test_maps as a 32-bit binary fail, in fact the kernel
      writes past the end of the value's array on the user's stack.
      
      To be quite honest, I think the kernel should reject creation of a
      per-cpu map that doesn't have a value size of at least 8 if that's
      what the kernel is going to silently adjust to later.
      
      If the user passed something smaller, it is a sizeof() calcualtion
      based upon the type they will actually use (just like in this testcase
      code) in later calls to the map operations.
      
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      89087c45
    • D
      bpf: add napi_id read access to __sk_buff · b1d9fc41
      Daniel Borkmann 提交于
      Add napi_id access to __sk_buff for socket filter program types, tc
      program types and other bpf_convert_ctx_access() users. Having access
      to skb->napi_id is useful for per RX queue listener siloing, f.e.
      in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
      used, meaning SO_REUSEPORT enabled listeners can then select the
      corresponding socket at SYN time already [1]. The skb is marked via
      skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
      
      Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d433902
      ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
      the NAPI ID associated with the queue for steering, which requires a
      prior sk_mark_napi_id() after the socket was looked up.
      
      Semantics for the __sk_buff napi_id access are similar, meaning if
      skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
      then an invalid napi_id of 0 is returned to the program, otherwise a
      valid non-zero napi_id.
      
        [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdfSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d9fc41
  12. 18 4月, 2017 3 次提交
  13. 07 4月, 2017 1 次提交
  14. 02 4月, 2017 4 次提交
  15. 25 3月, 2017 1 次提交
  16. 23 3月, 2017 2 次提交
    • M
      bpf: Add tests for map-in-map · fb30d4b7
      Martin KaFai Lau 提交于
      Test cases for array of maps and hash of maps.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb30d4b7
    • A
      bpf: fix hashmap extra_elems logic · 8c290e60
      Alexei Starovoitov 提交于
      In both kmalloc and prealloc mode the bpf_map_update_elem() is using
      per-cpu extra_elems to do atomic update when the map is full.
      There are two issues with it. The logic can be misused, since it allows
      max_entries+num_cpus elements to be present in the map. And alloc_extra_elems()
      at map creation time can fail percpu alloc for large map values with a warn:
      WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60
      illegal size (32824) or align (8) for percpu allocation
      
      The fixes for both of these issues are different for kmalloc and prealloc modes.
      For prealloc mode allocate extra num_possible_cpus elements and store
      their pointers into extra_elems array instead of actual elements.
      Hence we can use these hidden(spare) elements not only when the map is full
      but during bpf_map_update_elem() that replaces existing element too.
      That also improves performance, since pcpu_freelist_pop/push is avoided.
      Unfortunately this approach cannot be used for kmalloc mode which needs
      to kfree elements after rcu grace period. Therefore switch it back to normal
      kmalloc even when full and old element exists like it was prior to
      commit 6c905981 ("bpf: pre-allocate hash map elements").
      
      Add tests to check for over max_entries and large map values.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Fixes: 6c905981 ("bpf: pre-allocate hash map elements")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c290e60
  17. 22 3月, 2017 1 次提交
  18. 13 3月, 2017 1 次提交
  19. 16 2月, 2017 1 次提交
  20. 11 2月, 2017 6 次提交