1. 26 4月, 2022 1 次提交
  2. 12 3月, 2022 1 次提交
    • T
      bpf, test_run: Fix packet size check for live packet mode · b6f1f780
      Toke Høiland-Jørgensen 提交于
      The live packet mode uses some extra space at the start of each page to
      cache data structures so they don't have to be rebuilt at every repetition.
      This space wasn't correctly accounted for in the size checking of the
      arguments supplied to userspace. In addition, the definition of the frame
      size should include the size of the skb_shared_info (as there is other
      logic that subtracts the size of this).
      
      Together, these mistakes resulted in userspace being able to trip the
      XDP_WARN() in xdp_update_frame_from_buff(), which syzbot discovered in
      short order. Fix this by changing the frame size define and adding the
      extra headroom to the bpf_prog_test_run_xdp() function. Also drop the
      max_len parameter to the page_pool init, since this is related to DMA which
      is not used for the page pool instance in PROG_TEST_RUN.
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Reported-by: syzbot+0e91362d99386dc5de99@syzkaller.appspotmail.com
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220310225621.53374-1-toke@redhat.com
      b6f1f780
  3. 10 3月, 2022 3 次提交
    • Y
      bpf, test_run: Use kvfree() for memory allocated with kvmalloc() · 743bec1b
      Yihao Han 提交于
      It is allocated with kvmalloc(), the corresponding release function
      should not be kfree(), use kvfree() instead.
      
      Generated by: scripts/coccinelle/api/kfree_mismatch.cocci
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Signed-off-by: NYihao Han <hanyihao@vivo.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com
      743bec1b
    • T
      bpf: Initialise retval in bpf_prog_test_run_xdp() · eecbfd97
      Toke Høiland-Jørgensen 提交于
      The kernel test robot pointed out that the newly added
      bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by
      design), which means that the variable can be passed unitialised to
      bpf_test_finish(). Fix this by initialising the variable properly.
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com
      eecbfd97
    • T
      bpf: Add "live packet" mode for XDP in BPF_PROG_RUN · b530e9e1
      Toke Høiland-Jørgensen 提交于
      This adds support for running XDP programs through BPF_PROG_RUN in a mode
      that enables live packet processing of the resulting frames. Previous uses
      of BPF_PROG_RUN for XDP returned the XDP program return code and the
      modified packet data to userspace, which is useful for unit testing of XDP
      programs.
      
      The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
      ifindex and RXQ number as part of the context object being passed to the
      kernel. This patch reuses that code, but adds a new mode with different
      semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
      flag.
      
      When running BPF_PROG_RUN in this mode, the XDP program return codes will
      be honoured: returning XDP_PASS will result in the frame being injected
      into the networking stack as if it came from the selected networking
      interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
      being transmitted out that interface. XDP_TX is translated into an
      XDP_REDIRECT operation to the same interface, since the real XDP_TX action
      is only possible from within the network drivers themselves, not from the
      process context where BPF_PROG_RUN is executed.
      
      Internally, this new mode of operation creates a page pool instance while
      setting up the test run, and feeds pages from that into the XDP program.
      The setup cost of this is amortised over the number of repetitions
      specified by userspace.
      
      To support the performance testing use case, we further optimise the setup
      step so that all pages in the pool are pre-initialised with the packet
      data, and pre-computed context and xdp_frame objects stored at the start of
      each page. This makes it possible to entirely avoid touching the page
      content on each XDP program invocation, and enables sending up to 9
      Mpps/core on my test box.
      
      Because the data pages are recycled by the page pool, and the test runner
      doesn't re-initialise them for each run, subsequent invocations of the XDP
      program will see the packet data in the state it was after the last time it
      ran on that particular page. This means that an XDP program that modifies
      the packet before redirecting it has to be careful about which assumptions
      it makes about the packet content, but that is only an issue for the most
      naively written programs.
      
      Enabling the new flag is only allowed when not setting ctx_out and data_out
      in the test specification, since using it means frames will be redirected
      somewhere else, so they can't be returned.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
      b530e9e1
  4. 06 3月, 2022 2 次提交
  5. 02 3月, 2022 1 次提交
  6. 10 2月, 2022 1 次提交
  7. 08 2月, 2022 2 次提交
    • S
      bpf: test_run: Fix overflow in bpf_test_finish frags parsing · 5d1e9f43
      Stanislav Fomichev 提交于
      This place also uses signed min_t and passes this singed int to
      copy_to_user (which accepts unsigned argument). I don't think
      there is an issue, but let's be consistent.
      
      Fixes: 7855e0db ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220204235849.14658-2-sdf@google.com
      5d1e9f43
    • S
      bpf: test_run: Fix overflow in xdp frags parsing · 9d63b59d
      Stanislav Fomichev 提交于
      When kattr->test.data_size_in > INT_MAX, signed min_t will assign
      negative value to data_len. This negative value then gets passed
      over to copy_from_user where it is converted to (big) unsigned.
      
      Use unsigned min_t to avoid this overflow.
      
      usercopy: Kernel memory overwrite attempt detected to wrapped address
      (offset 0, size 18446612140539162846)!
      ------------[ cut here ]------------
      kernel BUG at mm/usercopy.c:102!
      invalid opcode: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 3781 Comm: syz-executor226 Not tainted 4.15.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102
      RSP: 0018:ffff8801e9703a38 EFLAGS: 00010286
      RAX: 000000000000006c RBX: ffffffff84fc7040 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff816560a2 RDI: ffffed003d2e0739
      RBP: ffff8801e9703a90 R08: 000000000000006c R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff84fc73a0
      R13: ffffffff84fc7180 R14: ffffffff84fc7040 R15: ffffffff84fc7040
      FS:  00007f54e0bec300(0000) GS:ffff8801f6600000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000280 CR3: 00000001e90ea000 CR4: 00000000003426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       check_bogus_address mm/usercopy.c:155 [inline]
       __check_object_size mm/usercopy.c:263 [inline]
       __check_object_size.cold+0x8c/0xad mm/usercopy.c:253
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_from_user include/linux/uaccess.h:142 [inline]
       bpf_prog_test_run_xdp+0xe57/0x1240 net/bpf/test_run.c:989
       bpf_prog_test_run kernel/bpf/syscall.c:3377 [inline]
       __sys_bpf+0xdf2/0x4a50 kernel/bpf/syscall.c:4679
       SYSC_bpf kernel/bpf/syscall.c:4765 [inline]
       SyS_bpf+0x26/0x50 kernel/bpf/syscall.c:4763
       do_syscall_64+0x21a/0x3e0 arch/x86/entry/common.c:305
       entry_SYSCALL_64_after_hwframe+0x46/0xbb
      
      Fixes: 1c194998 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220204235849.14658-1-sdf@google.com
      9d63b59d
  8. 04 2月, 2022 1 次提交
    • L
      bpf: test_run: Fix OOB access in bpf_prog_test_run_xdp · a6763080
      Lorenzo Bianconi 提交于
      Fix the following kasan issue reported by syzbot:
      
      BUG: KASAN: slab-out-of-bounds in __skb_frag_set_page include/linux/skbuff.h:3242 [inline]
      BUG: KASAN: slab-out-of-bounds in bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972
      Write of size 8 at addr ffff888048c75000 by task syz-executor.5/23405
      
      CPU: 1 PID: 23405 Comm: syz-executor.5 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       __skb_frag_set_page include/linux/skbuff.h:3242 [inline]
       bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972
       bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline]
       __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658
       __do_sys_bpf kernel/bpf/syscall.c:4744 [inline]
       __se_sys_bpf kernel/bpf/syscall.c:4742 [inline]
       __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f4ea30dd059
      RSP: 002b:00007f4ea1a52168 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00007f4ea31eff60 RCX: 00007f4ea30dd059
      RDX: 0000000000000048 RSI: 0000000020000000 RDI: 000000000000000a
      RBP: 00007f4ea313708d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffc8367c5af R14: 00007f4ea1a52300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 23405:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:437 [inline]
       ____kasan_kmalloc mm/kasan/common.c:516 [inline]
       ____kasan_kmalloc mm/kasan/common.c:475 [inline]
       __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:715 [inline]
       bpf_test_init.isra.0+0x9f/0x150 net/bpf/test_run.c:411
       bpf_prog_test_run_xdp+0x2f8/0x1150 net/bpf/test_run.c:941
       bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline]
       __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658
       __do_sys_bpf kernel/bpf/syscall.c:4744 [inline]
       __se_sys_bpf kernel/bpf/syscall.c:4742 [inline]
       __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff888048c74000
       which belongs to the cache kmalloc-4k of size 4096
      The buggy address is located 0 bytes to the right of
       4096-byte region [ffff888048c74000, ffff888048c75000)
      The buggy address belongs to the page:
      page:ffffea0001231c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x48c70
      head:ffffea0001231c00 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 dead000000000100 dead000000000122 ffff888010c42140
      raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
       prep_new_page mm/page_alloc.c:2434 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
       alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271
       alloc_slab_page mm/slub.c:1799 [inline]
       allocate_slab mm/slub.c:1944 [inline]
       new_slab+0x28a/0x3b0 mm/slub.c:2004
       ___slab_alloc+0x87c/0xe90 mm/slub.c:3018
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105
       slab_alloc_node mm/slub.c:3196 [inline]
       __kmalloc_node_track_caller+0x2cb/0x360 mm/slub.c:4957
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0xde/0x340 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1159 [inline]
       nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:745 [inline]
       nsim_dev_trap_report drivers/net/netdevsim/dev.c:802 [inline]
       nsim_dev_trap_report_work+0x29a/0xbc0 drivers/net/netdevsim/dev.c:843
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1352 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
       free_unref_page_prepare mm/page_alloc.c:3325 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3404
       qlink_free mm/kasan/quarantine.c:157 [inline]
       qlist_free_all+0x6d/0x160 mm/kasan/quarantine.c:176
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:283
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
       kasan_slab_alloc include/linux/kasan.h:260 [inline]
       slab_post_alloc_hook mm/slab.h:732 [inline]
       slab_alloc_node mm/slub.c:3230 [inline]
       slab_alloc mm/slub.c:3238 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3243
       getname_flags.part.0+0x50/0x4f0 fs/namei.c:138
       getname_flags include/linux/audit.h:323 [inline]
       getname+0x8e/0xd0 fs/namei.c:217
       do_sys_openat2+0xf5/0x4d0 fs/open.c:1208
       do_sys_open fs/open.c:1230 [inline]
       __do_sys_openat fs/open.c:1246 [inline]
       __se_sys_openat fs/open.c:1241 [inline]
       __x64_sys_openat+0x13f/0x1f0 fs/open.c:1241
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff888048c74f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff888048c74f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                         ^
       ffff888048c75080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff888048c75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      
      Fixes: 1c194998 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
      Reported-by: syzbot+6d70ca7438345077c549@syzkaller.appspotmail.com
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/688c26f9dd6e885e58e8e834ede3f0139bb7fa95.1643835097.git.lorenzo@kernel.org
      a6763080
  9. 25 1月, 2022 1 次提交
  10. 22 1月, 2022 3 次提交
  11. 19 1月, 2022 3 次提交
    • K
      selftests/bpf: Add test for race in btf_try_get_module · 46565696
      Kumar Kartikeya Dwivedi 提交于
      This adds a complete test case to ensure we never take references to
      modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
      ensures we never access btf->kfunc_set_tab in an inconsistent state.
      
      The test uses userfaultfd to artificially widen the race.
      
      When run on an unpatched kernel, it leads to the following splat:
      
      [root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
      [   55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
      [   55.499206] #PF: supervisor read access in kernel mode
      [   55.499855] #PF: error_code(0x0000) - not-present page
      [   55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
      [   55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [   55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G           OE     5.16.0-rc4+ #151
      [   55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
      [   55.504777] Workqueue: events bpf_prog_free_deferred
      [   55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
      [   55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
      [   55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
      [   55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
      [   55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
      [   55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
      [   55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
      [   55.515418] FS:  0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
      [   55.516705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
      [   55.518672] PKRU: 55555554
      [   55.519022] Call Trace:
      [   55.519483]  <TASK>
      [   55.519884]  module_put.part.0+0x2a/0x180
      [   55.520642]  bpf_prog_free_deferred+0x129/0x2e0
      [   55.521478]  process_one_work+0x4fa/0x9e0
      [   55.522122]  ? pwq_dec_nr_in_flight+0x100/0x100
      [   55.522878]  ? rwlock_bug.part.0+0x60/0x60
      [   55.523551]  worker_thread+0x2eb/0x700
      [   55.524176]  ? __kthread_parkme+0xd8/0xf0
      [   55.524853]  ? process_one_work+0x9e0/0x9e0
      [   55.525544]  kthread+0x23a/0x270
      [   55.526088]  ? set_kthread_struct+0x80/0x80
      [   55.526798]  ret_from_fork+0x1f/0x30
      [   55.527413]  </TASK>
      [   55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
      [   55.530846] CR2: fffffbfff802548b
      [   55.531341] ---[ end trace 1af41803c054ad6d ]---
      [   55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
      [   55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
      [   55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
      [   55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
      [   55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
      [   55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
      [   55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
      [   55.542108] FS:  0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
      [   55.543260]CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
      [   55.545317] PKRU: 55555554
      [   55.545671] note: kworker/0:2[83] exited with preempt_count 1
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      46565696
    • K
      selftests/bpf: Extend kfunc selftests · c1ff181f
      Kumar Kartikeya Dwivedi 提交于
      Use the prog_test kfuncs to test the referenced PTR_TO_BTF_ID kfunc
      support, and PTR_TO_CTX, PTR_TO_MEM argument passing support. Also
      testing the various failure cases for invalid kfunc prototypes.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-10-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      c1ff181f
    • K
      bpf: Remove check_kfunc_call callback and old kfunc BTF ID API · b202d844
      Kumar Kartikeya Dwivedi 提交于
      Completely remove the old code for check_kfunc_call to help it work
      with modules, and also the callback itself.
      
      The previous commit adds infrastructure to register all sets and put
      them in vmlinux or module BTF, and concatenates all related sets
      organized by the hook and the type. Once populated, these sets remain
      immutable for the lifetime of the struct btf.
      
      Also, since we don't need the 'owner' module anywhere when doing
      check_kfunc_call, drop the 'btf_modp' module parameter from
      find_kfunc_desc_btf.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      b202d844
  12. 21 10月, 2021 1 次提交
  13. 06 10月, 2021 2 次提交
    • K
      bpf: selftests: Add selftests for module kfunc support · c48e51c8
      Kumar Kartikeya Dwivedi 提交于
      This adds selftests that tests the success and failure path for modules
      kfuncs (in presence of invalid kfunc calls) for both libbpf and
      gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can
      add module BTF ID set from bpf_testmod.
      
      This also introduces  a couple of test cases to verifier selftests for
      validating whether we get an error or not depending on if invalid kfunc
      call remains after elimination of unreachable instructions.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
      c48e51c8
    • K
      bpf: Introduce BPF support for kernel module function calls · 2357672c
      Kumar Kartikeya Dwivedi 提交于
      This change adds support on the kernel side to allow for BPF programs to
      call kernel module functions. Userspace will prepare an array of module
      BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
      In the kernel, the module BTFs are placed in the auxilliary struct for
      bpf_prog, and loaded as needed.
      
      The verifier then uses insn->off to index into the fd_array. insn->off
      0 is reserved for vmlinux BTF (for backwards compat), so userspace must
      use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is
      sorted based on offset in an array, and each offset corresponds to one
      descriptor, with a max limit up to 256 such module BTFs.
      
      We also change existing kfunc_tab to distinguish each element based on
      imm, off pair as each such call will now be distinct.
      
      Another change is to check_kfunc_call callback, which now include a
      struct module * pointer, this is to be used in later patch such that the
      kfunc_id and module pointer are matched for dynamically registered BTF
      sets from loadable modules, so that same kfunc_id in two modules doesn't
      lead to check_kfunc_call succeeding. For the duration of the
      check_kfunc_call, the reference to struct module exists, as it returns
      the pointer stored in kfunc_btf_tab.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
      2357672c
  14. 30 9月, 2021 1 次提交
  15. 28 9月, 2021 1 次提交
    • D
      bpf, test, cgroup: Use sk_{alloc,free} for test cases · 435b08ec
      Daniel Borkmann 提交于
      BPF test infra has some hacks in place which kzalloc() a socket and perform
      minimum init via sock_net_set() and sock_init_data(). As a result, the sk's
      skcd->cgroup is NULL since it didn't go through proper initialization as it
      would have been the case from sk_alloc(). Rather than re-adding a NULL test
      in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test
      socket. The latter also allows to get rid of the bpf_sk_storage_free() special
      case.
      
      Fixes: 8520e224 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
      Fixes: b7a1848e ("bpf: add BPF_PROG_TEST_RUN support for flow dissector")
      Fixes: 2cb494a3 ("bpf: add tests for direct packet access from CGROUP_SKB")
      Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
      435b08ec
  16. 11 9月, 2021 1 次提交
  17. 08 9月, 2021 1 次提交
  18. 17 8月, 2021 1 次提交
    • A
      bpf: Refactor BPF_PROG_RUN into a function · fb7dd8bc
      Andrii Nakryiko 提交于
      Turn BPF_PROG_RUN into a proper always inlined function. No functional and
      performance changes are intended, but it makes it much easier to understand
      what's going on with how BPF programs are actually get executed. It's more
      obvious what types and callbacks are expected. Also extra () around input
      parameters can be dropped, as well as `__` variable prefixes intended to avoid
      naming collisions, which makes the code simpler to read and write.
      
      This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
      a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
      BPF_PROG_RUN into a function causes naming conflict compilation error. So
      rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
      bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
      BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
      fb7dd8bc
  19. 10 8月, 2021 1 次提交
  20. 05 8月, 2021 1 次提交
  21. 17 7月, 2021 1 次提交
    • A
      bpf: Add ambient BPF runtime context stored in current · c7603cfa
      Andrii Nakryiko 提交于
      b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
      helper") fixed the problem with cgroup-local storage use in BPF by
      pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
      possible BPF program preemptions and nested executions.
      
      While this seems to work good in practice, it introduces new and unnecessary
      failure mode in which not all BPF programs might be executed if we fail to
      find an unused slot for cgroup storage, however unlikely it is. It might also
      not be so unlikely when/if we allow sleepable cgroup BPF programs in the
      future.
      
      Further, the way that cgroup storage is implemented as ambiently-available
      property during entire BPF program execution is a convenient way to pass extra
      information to BPF program and helpers without requiring user code to pass
      around extra arguments explicitly. So it would be good to have a generic
      solution that can allow implementing this without arbitrary restrictions.
      Ideally, such solution would work for both preemptable and sleepable BPF
      programs in exactly the same way.
      
      This patch introduces such solution, bpf_run_ctx. It adds one pointer field
      (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
      macros in such a way that it always stays valid throughout BPF program
      execution. BPF program preemption is handled by remembering previous
      current->bpf_ctx value locally while executing nested BPF program and
      restoring old value after nested BPF program finishes. This is handled by two
      helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
      supposed to be used before and after BPF program runs, respectively.
      
      Restoring old value of the pointer handles preemption, while bpf_run_ctx
      pointer being a property of current task_struct naturally solves this problem
      for sleepable BPF programs by "following" BPF program execution as it is
      scheduled in and out of CPU. It would even allow CPU migration of BPF
      programs, even though it's not currently allowed by BPF infra.
      
      This patch cleans up cgroup local storage handling as a first application. The
      design itself is generic, though, with bpf_run_ctx being an empty struct that
      is supposed to be embedded into a specific struct for a given BPF program type
      (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
      this mechanism for other uses within tracing BPF programs.
      
      To verify that this change doesn't revert the fix to the original cgroup
      storage issue, I ran the same repro as in the original report ([0]) and didn't
      get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
      bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
      
        [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
      
      Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
      c7603cfa
  22. 12 7月, 2021 1 次提交
    • X
      bpf, test: fix NULL pointer dereference on invalid expected_attach_type · 5e21bb4e
      Xuan Zhuo 提交于
      These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be
      executed directly in the driver, therefore we should also not directly
      run them from here. To run in these two situations, there must be further
      preparations done, otherwise these may cause a kernel panic.
      
      For more details, see also dev_xdp_attach().
      
        [   46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000
        [   46.984295] #PF: supervisor read access in kernel mode
        [   46.985777] #PF: error_code(0x0000) - not-present page
        [   46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0
        [   46.989201] Oops: 0000 [#1] SMP PTI
        [   46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44
        [   46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24
        [   46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710
        [   46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
        [   47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
        [   47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
        [   47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
        [   47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
        [   47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
        [   47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
        [   47.013244] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        [   47.015705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
        [   47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   47.023574] PKRU: 55555554
        [   47.024571] Call Trace:
        [   47.025424]  __bpf_prog_run32+0x32/0x50
        [   47.026296]  ? printk+0x53/0x6a
        [   47.027066]  ? ktime_get+0x39/0x90
        [   47.027895]  bpf_test_run.cold.28+0x23/0x123
        [   47.028866]  ? printk+0x53/0x6a
        [   47.029630]  bpf_prog_test_run_xdp+0x149/0x1d0
        [   47.030649]  __sys_bpf+0x1305/0x23d0
        [   47.031482]  __x64_sys_bpf+0x17/0x20
        [   47.032316]  do_syscall_64+0x3a/0x80
        [   47.033165]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [   47.034254] RIP: 0033:0x7f04a51364dd
        [   47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48
        [   47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141
        [   47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd
        [   47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a
        [   47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100
        [   47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070
        [   47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        [   47.047579] Modules linked in:
        [   47.048318] CR2: 0000000000000000
        [   47.049120] ---[ end trace 7ad34443d5be719a ]---
        [   47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710
        [   47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
        [   47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
        [   47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
        [   47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
        [   47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
        [   47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
        [   47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
        [   47.063249] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        [   47.065070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
        [   47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   47.070652] PKRU: 55555554
        [   47.071318] Kernel panic - not syncing: Fatal exception
        [   47.072854] Kernel Offset: disabled
        [   47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fixes: 92164774 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
      Fixes: fbee97fe ("bpf: Add support to attach bpf program to a devmap entry")
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NDavid Ahern <dsahern@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.com
      5e21bb4e
  23. 08 7月, 2021 2 次提交
  24. 19 5月, 2021 2 次提交
  25. 27 3月, 2021 1 次提交
  26. 26 3月, 2021 1 次提交
    • Y
      bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper · b910eaaa
      Yonghong Song 提交于
      Jiri Olsa reported a bug ([1]) in kernel where cgroup local
      storage pointer may be NULL in bpf_get_local_storage() helper.
      There are two issues uncovered by this bug:
        (1). kprobe or tracepoint prog incorrectly sets cgroup local storage
             before prog run,
        (2). due to change from preempt_disable to migrate_disable,
             preemption is possible and percpu storage might be overwritten
             by other tasks.
      
      This issue (1) is fixed in [2]. This patch tried to address issue (2).
      The following shows how things can go wrong:
        task 1:   bpf_cgroup_storage_set() for percpu local storage
               preemption happens
        task 2:   bpf_cgroup_storage_set() for percpu local storage
               preemption happens
        task 1:   run bpf program
      
      task 1 will effectively use the percpu local storage setting by task 2
      which will be either NULL or incorrect ones.
      
      Instead of just one common local storage per cpu, this patch fixed
      the issue by permitting 8 local storages per cpu and each local
      storage is identified by a task_struct pointer. This way, we
      allow at most 8 nested preemption between bpf_cgroup_storage_set()
      and bpf_cgroup_storage_unset(). The percpu local storage slot
      is released (calling bpf_cgroup_storage_unset()) by the same task
      after bpf program finished running.
      bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set()
      interface.
      
      The patch is tested on top of [2] with reproducer in [1].
      Without this patch, kernel will emit error in 2-3 minutes.
      With this patch, after one hour, still no error.
      
       [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
       [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.comSigned-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com
      b910eaaa
  27. 05 3月, 2021 2 次提交
  28. 14 1月, 2021 1 次提交
    • S
      bpf: Reject too big ctx_size_in for raw_tp test run · 7ac6ad05
      Song Liu 提交于
      syzbot reported a WARNING for allocating too big memory:
      
      WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011
      Modules linked in:
      CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976
      Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff <0f> 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8
      RSP: 0018:ffffc900012efb10 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0
      RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014
      R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
      FS:  000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
      alloc_pages include/linux/gfp.h:547 [inline]
      kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
      kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
      kmalloc include/linux/slab.h:557 [inline]
      kzalloc include/linux/slab.h:682 [inline]
      bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282
      bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline]
      __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398
      do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x440499
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499
      RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0
      R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000
      
      This is because we didn't filter out too big ctx_size_in. Fix it by
      rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64
      numbers.
      
      Fixes: 1b4d60ec ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
      Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com
      7ac6ad05