1. 05 3月, 2021 2 次提交
  2. 14 1月, 2021 1 次提交
    • S
      bpf: Reject too big ctx_size_in for raw_tp test run · 7ac6ad05
      Song Liu 提交于
      syzbot reported a WARNING for allocating too big memory:
      
      WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011
      Modules linked in:
      CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976
      Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff <0f> 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8
      RSP: 0018:ffffc900012efb10 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0
      RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014
      R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
      FS:  000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
      alloc_pages include/linux/gfp.h:547 [inline]
      kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
      kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
      kmalloc include/linux/slab.h:557 [inline]
      kzalloc include/linux/slab.h:682 [inline]
      bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282
      bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline]
      __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398
      do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x440499
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499
      RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0
      R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000
      
      This is because we didn't filter out too big ctx_size_in. Fix it by
      rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64
      numbers.
      
      Fixes: 1b4d60ec ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
      Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com
      7ac6ad05
  3. 09 1月, 2021 2 次提交
  4. 30 9月, 2020 1 次提交
    • S
      bpf: fix raw_tp test run in preempt kernel · 963ec27a
      Song Liu 提交于
      In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:
      
      [   35.874974] BUG: using smp_processor_id() in preemptible [00000000]
      code: new_name/87
      [   35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
      [   35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
      [   35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.10.2-1ubuntu1 04/01/2014
      [   35.916941] Call Trace:
      [   35.919660]  dump_stack+0x77/0x9b
      [   35.923273]  check_preemption_disabled+0xb4/0xc0
      [   35.928376]  bpf_prog_test_run_raw_tp+0xd4/0x1b0
      [   35.933872]  ? selinux_bpf+0xd/0x70
      [   35.937532]  __do_sys_bpf+0x6bb/0x21e0
      [   35.941570]  ? find_held_lock+0x2d/0x90
      [   35.945687]  ? vfs_write+0x150/0x220
      [   35.949586]  do_syscall_64+0x2d/0x40
      [   35.953443]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix this by calling migrate_disable() before smp_processor_id().
      
      Fixes: 1b4d60ec ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      963ec27a
  5. 29 9月, 2020 1 次提交
  6. 24 8月, 2020 1 次提交
  7. 04 8月, 2020 2 次提交
  8. 01 7月, 2020 1 次提交
    • Y
      bpf: Add tests for PTR_TO_BTF_ID vs. null comparison · d923021c
      Yonghong Song 提交于
      Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
      one for PTR_TO_BTF_ID in the ctx structure and the
      other for PTR_TO_BTF_ID after one level pointer chasing.
      In both cases, the test ensures condition is not
      removed.
      
      For example, for this test
       struct bpf_fentry_test_t {
           struct bpf_fentry_test_t *a;
       };
       int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
       {
           if (arg == 0)
               test7_result = 1;
           return 0;
       }
      Before the previous verifier change, we have xlated codes:
        int test7(long long unsigned int * ctx):
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           0: (79) r1 = *(u64 *)(r1 +0)
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           1: (b4) w0 = 0
           2: (95) exit
      After the previous verifier change, we have:
        int test7(long long unsigned int * ctx):
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           0: (79) r1 = *(u64 *)(r1 +0)
        ; if (arg == 0)
           1: (55) if r1 != 0x0 goto pc+4
        ; test7_result = 1;
           2: (18) r1 = map[id:6][0]+48
           4: (b7) r2 = 1
           5: (7b) *(u64 *)(r1 +0) = r2
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           6: (b4) w0 = 0
           7: (95) exit
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
      d923021c
  9. 19 5月, 2020 1 次提交
  10. 15 5月, 2020 1 次提交
  11. 29 3月, 2020 1 次提交
  12. 05 3月, 2020 2 次提交
  13. 04 3月, 2020 1 次提交
  14. 25 2月, 2020 1 次提交
  15. 19 12月, 2019 1 次提交
  16. 14 12月, 2019 2 次提交
  17. 11 12月, 2019 1 次提交
  18. 10 12月, 2019 1 次提交
  19. 19 11月, 2019 1 次提交
  20. 16 11月, 2019 1 次提交
  21. 16 10月, 2019 1 次提交
  22. 26 7月, 2019 2 次提交
  23. 31 5月, 2019 1 次提交
  24. 28 4月, 2019 1 次提交
    • M
      bpf: Introduce bpf sk local storage · 6ac99e8f
      Martin KaFai Lau 提交于
      After allowing a bpf prog to
      - directly read the skb->sk ptr
      - get the fullsock bpf_sock by "bpf_sk_fullsock()"
      - get the bpf_tcp_sock by "bpf_tcp_sock()"
      - get the listener sock by "bpf_get_listener_sock()"
      - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
        into different bpf running context.
      
      this patch is another effort to make bpf's network programming
      more intuitive to do (together with memory and performance benefit).
      
      When bpf prog needs to store data for a sk, the current practice is to
      define a map with the usual 4-tuples (src/dst ip/port) as the key.
      If multiple bpf progs require to store different sk data, multiple maps
      have to be defined.  Hence, wasting memory to store the duplicated
      keys (i.e. 4 tuples here) in each of the bpf map.
      [ The smallest key could be the sk pointer itself which requires
        some enhancement in the verifier and it is a separate topic. ]
      
      Also, the bpf prog needs to clean up the elem when sk is freed.
      Otherwise, the bpf map will become full and un-usable quickly.
      The sk-free tracking currently could be done during sk state
      transition (e.g. BPF_SOCK_OPS_STATE_CB).
      
      The size of the map needs to be predefined which then usually ended-up
      with an over-provisioned map in production.  Even the map was re-sizable,
      while the sk naturally come and go away already, this potential re-size
      operation is arguably redundant if the data can be directly connected
      to the sk itself instead of proxy-ing through a bpf map.
      
      This patch introduces sk->sk_bpf_storage to provide local storage space
      at sk for bpf prog to use.  The space will be allocated when the first bpf
      prog has created data for this particular sk.
      
      The design optimizes the bpf prog's lookup (and then optionally followed by
      an inline update).  bpf_spin_lock should be used if the inline update needs
      to be protected.
      
      BPF_MAP_TYPE_SK_STORAGE:
      -----------------------
      To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
      this patch) needs to be created.  Multiple BPF_MAP_TYPE_SK_STORAGE maps can
      be created to fit different bpf progs' needs.  The map enforces
      BTF to allow printing the sk-local-storage during a system-wise
      sk dump (e.g. "ss -ta") in the future.
      
      The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
      a "sk-local-storage" data from a particular sk.
      Think of the map as a meta-data (or "type") of a "sk-local-storage".  This
      particular "type" of "sk-local-storage" data can then be stored in any sk.
      
      The main purposes of this map are mostly:
      1. Define the size of a "sk-local-storage" type.
      2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
         map-id, map-btf...etc.)
      3. Keep track of all sk's storages of this "type" and clean them up
         when the map is freed.
      
      sk->sk_bpf_storage:
      ------------------
      The main lookup/update/delete is done on sk->sk_bpf_storage (which
      is a "struct bpf_sk_storage").  When doing a lookup,
      the "map" pointer is now used as the "key" to search on the
      sk_storage->list.  The "map" pointer is actually serving
      as the "type" of the "sk-local-storage" that is being
      requested.
      
      To allow very fast lookup, it should be as fast as looking up an
      array at a stable-offset.  At the same time, it is not ideal to
      set a hard limit on the number of sk-local-storage "type" that the
      system can have.  Hence, this patch takes a cache approach.
      The last search result from sk_storage->list is cached in
      sk_storage->cache[] which is a stable sized array.  Each
      "sk-local-storage" type has a stable offset to the cache[] array.
      In the future, a map's flag could be introduced to do cache
      opt-out/enforcement if it became necessary.
      
      The cache size is 16 (i.e. 16 types of "sk-local-storage").
      Programs can share map.  On the program side, having a few bpf_progs
      running in the networking hotpath is already a lot.  The bpf_prog
      should have already consolidated the existing sock-key-ed map usage
      to minimize the map lookup penalty.  16 has enough runway to grow.
      
      All sk-local-storage data will be removed from sk->sk_bpf_storage
      during sk destruction.
      
      bpf_sk_storage_get() and bpf_sk_storage_delete():
      ------------------------------------------------
      Instead of using bpf_map_(lookup|update|delete)_elem(),
      the bpf prog needs to use the new helper bpf_sk_storage_get() and
      bpf_sk_storage_delete().  The verifier can then enforce the
      ARG_PTR_TO_SOCKET argument.  The bpf_sk_storage_get() also allows to
      "create" new elem if one does not exist in the sk.  It is done by
      the new BPF_SK_STORAGE_GET_F_CREATE flag.  An optional value can also be
      provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
      The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock.  Together,
      it has eliminated the potential use cases for an equivalent
      bpf_map_update_elem() API (for bpf_prog) in this patch.
      
      Misc notes:
      ----------
      1. map_get_next_key is not supported.  From the userspace syscall
         perspective,  the map has the socket fd as the key while the map
         can be shared by pinned-file or map-id.
      
         Since btf is enforced, the existing "ss" could be enhanced to pretty
         print the local-storage.
      
         Supporting a kernel defined btf with 4 tuples as the return key could
         be explored later also.
      
      2. The sk->sk_lock cannot be acquired.  Atomic operations is used instead.
         e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
         Please refer to the source code comments for the details in
         synchronization cases and considerations.
      
      3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
      
      Benchmark:
      ---------
      Here is the benchmark data collected by turning on
      the "kernel.bpf_stats_enabled" sysctl.
      Two bpf progs are tested:
      
      One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
      sk ptr as the key. (verifier is modified to support sk ptr as the key
      That should have shortened the key lookup time.)
      
      Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
      
      Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
      each egress skb and then bump the cnt.  netperf is used to drive
      data with 4096 connected UDP sockets.
      
      BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
      27: cgroup_skb  name egress_sk_map  tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
          loaded_at 2019-04-15T13:46:39-0700  uid 0
          xlated 344B  jited 258B  memlock 4096B  map_ids 16
          btf_id 5
      
      BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
      30: cgroup_skb  name egress_sk_stora  tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
          loaded_at 2019-04-15T13:47:54-0700  uid 0
          xlated 168B  jited 156B  memlock 4096B  map_ids 17
          btf_id 6
      
      Here is a high-level picture on how are the objects organized:
      
             sk
          ┌──────┐
          │      │
          │      │
          │      │
          │*sk_bpf_storage───── bpf_sk_storage
          └──────┘                 ┌───────┐
                       ┌───────────┤ list  │
                       │           │       │
                       │           │       │
                       │           │       │
                       │           └───────┘
                       │
                       │     elem
                       │  ┌────────┐
                       ├─│ snode  │
                       │  ├────────┤
                       │  │  data  │          bpf_map
                       │  ├────────┤        ┌─────────┐
                       │  │map_node│─┬─────┤  list   │
                       │  └────────┘  │     │         │
                       │              │     │         │
                       │     elem     │     │         │
                       │  ┌────────┐  │     └─────────┘
                       └─│ snode  │  │
                          ├────────┤  │
         bpf_map          │  data  │  │
       ┌─────────┐        ├────────┤  │
       │  list   ├───────│map_node│  │
       │         │        └────────┘  │
       │         │                    │
       │         │           elem     │
       └─────────┘        ┌────────┐  │
                       ┌─│ snode  │  │
                       │  ├────────┤  │
                       │  │  data  │  │
                       │  ├────────┤  │
                       │  │map_node│─┘
                       │  └────────┘
                       │
                       │
                       │          ┌───────┐
           sk          └──────────│ list  │
        ┌──────┐                  │       │
        │      │                  │       │
        │      │                  │       │
        │      │                  └───────┘
        │*sk_bpf_storage───────bpf_sk_storage
        └──────┘
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6ac99e8f
  25. 27 4月, 2019 1 次提交
  26. 24 4月, 2019 3 次提交
  27. 12 4月, 2019 1 次提交
  28. 11 4月, 2019 1 次提交
    • S
      bpf: support input __sk_buff context in BPF_PROG_TEST_RUN · b0b9395d
      Stanislav Fomichev 提交于
      Add new set of arguments to bpf_attr for BPF_PROG_TEST_RUN:
      * ctx_in/ctx_size_in - input context
      * ctx_out/ctx_size_out - output context
      
      The intended use case is to pass some meta data to the test runs that
      operate on skb (this has being brought up on recent LPC).
      
      For programs that use bpf_prog_test_run_skb, support __sk_buff input and
      output. Initially, from input __sk_buff, copy _only_ cb and priority into
      skb, all other non-zero fields are prohibited (with EINVAL).
      If the user has set ctx_out/ctx_size_out, copy the potentially modified
      __sk_buff back to the userspace.
      
      We require all fields of input __sk_buff except the ones we explicitly
      support to be set to zero. The expectation is that in the future we might
      add support for more fields and we want to fail explicitly if the user
      runs the program on the kernel where we don't yet support them.
      
      The API is intentionally vague (i.e. we don't explicitly add __sk_buff
      to bpf_attr, but ctx_in) to potentially let other test_run types use
      this interface in the future (this can be xdp_md for xdp types for
      example).
      
      v4:
        * don't copy more than allowed in bpf_ctx_init [Martin]
      
      v3:
        * handle case where ctx_in is NULL, but ctx_out is not [Martin]
        * convert size==0 checks to ptr==NULL checks and add some extra ptr
          checks [Martin]
      
      v2:
        * Addressed comments from Martin Lau
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b0b9395d
  29. 09 3月, 2019 1 次提交
  30. 26 2月, 2019 1 次提交
  31. 19 2月, 2019 1 次提交
  32. 29 1月, 2019 1 次提交