1. 20 7月, 2020 1 次提交
  2. 09 7月, 2020 2 次提交
    • K
      bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() · 63960260
      Kees Cook 提交于
      When evaluating access control over kallsyms visibility, credentials at
      open() time need to be used, not the "current" creds (though in BPF's
      case, this has likely always been the same). Plumb access to associated
      file->f_cred down through bpf_dump_raw_ok() and its callers now that
      kallsysm_show_value() has been refactored to take struct cred.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: bpf@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 7105e828 ("bpf: allow for correlation of maps and helpers in dump")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      63960260
    • K
      kallsyms: Refactor kallsyms_show_value() to take cred · 16025184
      Kees Cook 提交于
      In order to perform future tests against the cred saved during open(),
      switch kallsyms_show_value() to operate on a cred, and have all current
      callers pass current_cred(). This makes it very obvious where callers
      are checking the wrong credential in their "read" contexts. These will
      be fixed in the coming patches.
      
      Additionally switch return value to bool, since it is always used as a
      direct permission check, not a 0-on-success, negative-on-error style
      function return.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      16025184
  3. 08 5月, 2020 2 次提交
    • E
      crypto: lib/sha1 - fold linux/cryptohash.h into crypto/sha.h · 228c4f26
      Eric Biggers 提交于
      <linux/cryptohash.h> sounds very generic and important, like it's the
      header to include if you're doing cryptographic hashing in the kernel.
      But actually it only includes the library implementation of the SHA-1
      compression function (not even the full SHA-1).  This should basically
      never be used anymore; SHA-1 is no longer considered secure, and there
      are much better ways to do cryptographic hashing in the kernel.
      
      Remove this header and fold it into <crypto/sha.h> which already
      contains constants and functions for SHA-1 (along with SHA-2).
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      228c4f26
    • E
      crypto: lib/sha1 - rename "sha" to "sha1" · 6b0b0fa2
      Eric Biggers 提交于
      The library implementation of the SHA-1 compression function is
      confusingly called just "sha_transform()".  Alongside it are some "SHA_"
      constants and "sha_init()".  Presumably these are left over from a time
      when SHA just meant SHA-1.  But now there are also SHA-2 and SHA-3, and
      moreover SHA-1 is now considered insecure and thus shouldn't be used.
      
      Therefore, rename these functions and constants to make it very clear
      that they are for SHA-1.  Also add a comment to make it clear that these
      shouldn't be used.
      
      For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to
      SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that
      it matches the same definition in <crypto/sha.h>.  This prepares for
      merging <linux/cryptohash.h> into <crypto/sha.h>.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6b0b0fa2
  4. 05 5月, 2020 1 次提交
    • A
      bpf: Avoid gcc-10 stringop-overflow warning in struct bpf_prog · d26c0cc5
      Arnd Bergmann 提交于
      gcc-10 warns about accesses to zero-length arrays:
      
      kernel/bpf/core.c: In function 'bpf_patch_insn_single':
      cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
      In file included from kernel/bpf/core.c:21:
      include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here
        550 |   struct bpf_insn  insnsi[0];
            |                    ^~~~~~
      
      In this case, we really want to have two flexible-array members,
      but that is not possible. Removing the union to make insnsi a
      flexible-array member while leaving insns as a zero-length array
      fixes the warning, as nothing writes to the other one in that way.
      
      This trick only works on linux-3.18 or higher, as older versions
      had additional members in the union.
      
      Fixes: 60a3b225 ("net: bpf: make eBPF interpreter images read-only")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
      d26c0cc5
  5. 26 4月, 2020 1 次提交
  6. 14 3月, 2020 2 次提交
  7. 25 2月, 2020 4 次提交
  8. 17 1月, 2020 1 次提交
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
  9. 10 1月, 2020 1 次提交
    • M
      bpf: tcp: Support tcp_congestion_ops in bpf · 0baf26b0
      Martin KaFai Lau 提交于
      This patch makes "struct tcp_congestion_ops" to be the first user
      of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
      in bpf.
      
      The BPF implemented tcp_congestion_ops can be used like
      regular kernel tcp-cc through sysctl and setsockopt.  e.g.
      [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
      net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
      net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
      net.ipv4.tcp_congestion_control = bpf_cubic
      
      There has been attempt to move the TCP CC to the user space
      (e.g. CCP in TCP).   The common arguments are faster turn around,
      get away from long-tail kernel versions in production...etc,
      which are legit points.
      
      BPF has been the continuous effort to join both kernel and
      userspace upsides together (e.g. XDP to gain the performance
      advantage without bypassing the kernel).  The recent BPF
      advancements (in particular BTF-aware verifier, BPF trampoline,
      BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
      possible in BPF.  It allows a faster turnaround for testing algorithm
      in the production while leveraging the existing (and continue growing)
      BPF feature/framework instead of building one specifically for
      userspace TCP CC.
      
      This patch allows write access to a few fields in tcp-sock
      (in bpf_tcp_ca_btf_struct_access()).
      
      The optional "get_info" is unsupported now.  It can be added
      later.  One possible way is to output the info with a btf-id
      to describe the content.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
      0baf26b0
  10. 20 12月, 2019 1 次提交
  11. 14 12月, 2019 1 次提交
  12. 10 12月, 2019 1 次提交
  13. 02 12月, 2019 1 次提交
  14. 25 11月, 2019 2 次提交
  15. 16 11月, 2019 1 次提交
  16. 23 10月, 2019 1 次提交
    • D
      bpf: Fix use after free in subprog's jited symbol removal · cd7455f1
      Daniel Borkmann 提交于
      syzkaller managed to trigger the following crash:
      
        [...]
        BUG: unable to handle page fault for address: ffffc90001923030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
        RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
        [...]
        Call Trace:
         kernel_text_address kernel/extable.c:147 [inline]
         __kernel_text_address+0x9a/0x110 kernel/extable.c:102
         unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
         arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
         stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
         save_stack mm/kasan/common.c:69 [inline]
         set_track mm/kasan/common.c:77 [inline]
         __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
         kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
         slab_post_alloc_hook mm/slab.h:584 [inline]
         slab_alloc mm/slab.c:3319 [inline]
         kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
         getname_flags+0xba/0x640 fs/namei.c:138
         getname+0x19/0x20 fs/namei.c:209
         do_sys_open+0x261/0x560 fs/open.c:1091
         __do_sys_open fs/open.c:1115 [inline]
         __se_sys_open fs/open.c:1110 [inline]
         __x64_sys_open+0x87/0x90 fs/open.c:1110
         do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
      
      After further debugging it turns out that we walk kallsyms while in parallel
      we tear down a BPF program which contains subprograms that have been JITed
      though the program itself has not been fully exposed and is eventually bailing
      out with error.
      
      The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
      the symbols, however, bpf_prog_free() tears down the JIT memory too early via
      scheduled work. Instead, it needs to properly respect RCU grace period as the
      kallsyms walk for BPF is under RCU.
      
      Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
      path where we defer final destruction when we have subprogs in the program.
      
      Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup")
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
      cd7455f1
  17. 17 10月, 2019 2 次提交
  18. 16 9月, 2019 1 次提交
  19. 24 7月, 2019 1 次提交
    • I
      bpf: fix narrower loads on s390 · d9b8aada
      Ilya Leoshkevich 提交于
      The very first check in test_pkt_md_access is failing on s390, which
      happens because loading a part of a struct __sk_buff field produces
      an incorrect result.
      
      The preprocessed code of the check is:
      
      {
      	__u8 tmp = *((volatile __u8 *)&skb->len +
      		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
      	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
      };
      
      clang generates the following code for it:
      
            0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
            1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
            2:	57 30 00 00 00 00 00 ff	r3 &= 255
            3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>
      
      Finally, verifier transforms it to:
      
        0: (61) r2 = *(u32 *)(r1 +104)
        1: (bc) w2 = w2
        2: (74) w2 >>= 24
        3: (bc) w2 = w2
        4: (54) w2 &= 255
        5: (bc) w2 = w2
      
      The problem is that when verifier emits the code to replace a partial
      load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
      struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
      bitwise AND, it assumes that the machine is little endian and
      incorrectly decides to use a shift.
      
      Adjust shift count calculation to account for endianness.
      
      Fixes: 31fd8581 ("bpf: permits narrower load from bpf program context fields")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d9b8aada
  20. 16 7月, 2019 1 次提交
  21. 08 7月, 2019 1 次提交
  22. 29 6月, 2019 2 次提交
  23. 28 6月, 2019 1 次提交
    • S
      bpf: implement getsockopt and setsockopt hooks · 0d01da6a
      Stanislav Fomichev 提交于
      Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
      BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
      
      BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
      passing them down to the kernel or bypass kernel completely.
      BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
      kernel returns.
      Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.
      
      The buffer memory is pre-allocated (because I don't think there is
      a precedent for working with __user memory from bpf). This might be
      slow to do for each {s,g}etsockopt call, that's why I've added
      __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
      attached to a cgroup. Note, however, that there is a race between
      __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
      program layout might have changed; this should not be a problem
      because in general there is a race between multiple calls to
      {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
      
      The return code of the BPF program is handled as follows:
      * 0: EPERM
      * 1: success, continue with next BPF program in the cgroup chain
      
      v9:
      * allow overwriting setsockopt arguments (Alexei Starovoitov):
        * use set_fs (same as kernel_setsockopt)
        * buffer is always kzalloc'd (no small on-stack buffer)
      
      v8:
      * use s32 for optlen (Andrii Nakryiko)
      
      v7:
      * return only 0 or 1 (Alexei Starovoitov)
      * always run all progs (Alexei Starovoitov)
      * use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
        (decided to use optval=-1 instead, optval=0 might be a valid input)
      * call getsockopt hook after kernel handlers (Alexei Starovoitov)
      
      v6:
      * rework cgroup chaining; stop as soon as bpf program returns
        0 or 2; see patch with the documentation for the details
      * drop Andrii's and Martin's Acked-by (not sure they are comfortable
        with the new state of things)
      
      v5:
      * skip copy_to_user() and put_user() when ret == 0 (Martin Lau)
      
      v4:
      * don't export bpf_sk_fullsock helper (Martin Lau)
      * size != sizeof(__u64) for uapi pointers (Martin Lau)
      * offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)
      
      v3:
      * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
      * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
        Nakryiko)
      * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
      * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
      * new CG_SOCKOPT_ACCESS macro to wrap repeated parts
      
      v2:
      * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
      * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
      * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
      * added [0,2] return code check to verifier (Martin Lau)
      * dropped unused buf[64] from the stack (Martin Lau)
      * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
      * dropped bpf_target_off from ctx rewrites (Martin Lau)
      * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)
      
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0d01da6a
  24. 01 6月, 2019 1 次提交
    • B
      bpf: cgroup inet skb programs can return 0 to 3 · 5cf1e914
      brakmo 提交于
      Allows cgroup inet skb programs to return values in the range [0, 3].
      The second bit is used to deterine if congestion occurred and higher
      level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr()
      
      The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS
      at load time if it uses the new return values (i.e. 2 or 3).
      
      The expected_attach_type is currently not enforced for
      BPF_PROG_TYPE_CGROUP_SKB.  e.g Meaning the current bpf_prog with
      expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to
      BPF_CGROUP_INET_INGRESS.  Blindly enforcing expected_attach_type will
      break backward compatibility.
      
      This patch adds a enforce_expected_attach_type bit to only
      enforce the expected_attach_type when it uses the new
      return value.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5cf1e914
  25. 25 5月, 2019 2 次提交
    • J
      bpf: verifier: insert zero extension according to analysis result · a4b1d3c1
      Jiong Wang 提交于
      After previous patches, verifier will mark a insn if it really needs zero
      extension on dst_reg.
      
      It is then for back-ends to decide how to use such information to eliminate
      unnecessary zero extension code-gen during JIT compilation.
      
      One approach is verifier insert explicit zero extension for those insns
      that need zero extension in a generic way, JIT back-ends then do not
      generate zero extension for sub-register write at default.
      
      However, only those back-ends which do not have hardware zero extension
      want this optimization. Back-ends like x86_64 and AArch64 have hardware
      zero extension support that the insertion should be disabled.
      
      This patch introduces new target hook "bpf_jit_needs_zext" which returns
      false at default, meaning verifier zero extension insertion is disabled at
      default. A back-end could override this hook to return true if it doesn't
      have hardware support and want verifier insert zero extension explicitly.
      
      Offload targets do not use this native target hook, instead, they could
      get the optimization results using bpf_prog_offload_ops.finalize.
      
      NOTE: arches could have diversified features, it is possible for one arch
      to have hardware zero extension support for some sub-register write insns
      but not for all. For example, PowerPC, SPARC have zero extended loads, but
      not for alu32. So when verifier zero extension insertion enabled, these JIT
      back-ends need to peephole insns to remove those zero extension inserted
      for insn that actually has hardware zero extension support. The peephole
      could be as simple as looking the next insn, if it is a special zero
      extension insn then it is safe to eliminate it if the current insn has
      hardware zero extension support.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a4b1d3c1
    • J
      bpf: introduce new mov32 variant for doing explicit zero extension · 7d134041
      Jiong Wang 提交于
      The encoding for this new variant is based on BPF_X format. "imm" field was
      0 only, now it could be 1 which means doing zero extension unconditionally
      
        .code = BPF_ALU | BPF_MOV | BPF_X
        .dst_reg = DST
        .src_reg = SRC
        .imm  = 1
      
      We use this new form for doing zero extension for which verifier will
      guarantee SRC == DST.
      
      Implications on JIT back-ends when doing code-gen for
      BPF_ALU | BPF_MOV | BPF_X:
        1. No change if hardware already does zero extension unconditionally for
           sub-register write.
        2. Otherwise, when seeing imm == 1, just generate insns to clear high
           32-bit. No need to generate insns for the move because when imm == 1,
           dst_reg is the same as src_reg at the moment.
      
      Interpreter doesn't need change as well. It is doing unconditionally zero
      extension for mov32 already.
      
      One helper macro BPF_ZEXT_REG is added to help creating zero extension
      insn using this new mov32 variant.
      
      One helper function insn_is_zext is added for checking one insn is an
      zero extension on dst. This will be widely used by a few JIT back-ends in
      later patches in this set.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7d134041
  26. 30 4月, 2019 2 次提交
    • R
      bpf: Use vmalloc special flag · d53d2f78
      Rick Edgecombe 提交于
      Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
      permissioned memory in vmalloc and remove places where memory was set RW
      before freeing which is no longer needed. Don't track if the memory is RO
      anymore because it is now tracked in vmalloc.
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d53d2f78
    • N
      x86/modules: Avoid breaking W^X while loading modules · f2c65fb3
      Nadav Amit 提交于
      When modules and BPF filters are loaded, there is a time window in
      which some memory is both writable and executable. An attacker that has
      already found another vulnerability (e.g., a dangling pointer) might be
      able to exploit this behavior to overwrite kernel code. Prevent having
      writable executable PTEs in this stage.
      
      In addition, avoiding having W+X mappings can also slightly simplify the
      patching of modules code on initialization (e.g., by alternatives and
      static-key), as would be done in the next patch. This was actually the
      main motivation for this patch.
      
      To avoid having W+X mappings, set them initially as RW (NX) and after
      they are set as RO set them as X as well. Setting them as executable is
      done as a separate step to avoid one core in which the old PTE is cached
      (hence writable), and another which sees the updated PTE (executable),
      which would break the W^X protection.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2c65fb3
  27. 13 4月, 2019 3 次提交
    • A
      bpf: Add file_pos field to bpf_sysctl ctx · e1550bfe
      Andrey Ignatov 提交于
      Add file_pos field to bpf_sysctl context to read and write sysctl file
      position at which sysctl is being accessed (read or written).
      
      The field can be used to e.g. override whole sysctl value on write to
      sysctl even when sys_write is called by user space with file_pos > 0. Or
      BPF program may reject such accesses.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e1550bfe
    • A
      bpf: Introduce bpf_sysctl_{get,set}_new_value helpers · 4e63acdf
      Andrey Ignatov 提交于
      Add helpers to work with new value being written to sysctl by user
      space.
      
      bpf_sysctl_get_new_value() copies value being written to sysctl into
      provided buffer.
      
      bpf_sysctl_set_new_value() overrides new value being written by user
      space with a one from provided buffer. Buffer should contain string
      representation of the value, similar to what can be seen in /proc/sys/.
      
      Both helpers can be used only on sysctl write.
      
      File position matters and can be managed by an interface that will be
      introduced separately. E.g. if user space calls sys_write to a file in
      /proc/sys/ at file position = X, where X > 0, then the value set by
      bpf_sysctl_set_new_value() will be written starting from X. If program
      wants to override whole value with specified buffer, file position has
      to be set to zero.
      
      Documentation for the new helpers is provided in bpf.h UAPI.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4e63acdf
    • A
      bpf: Introduce bpf_sysctl_get_current_value helper · 1d11b301
      Andrey Ignatov 提交于
      Add bpf_sysctl_get_current_value() helper to copy current sysctl value
      into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.
      
      It provides same string as user space can see by reading corresponding
      file in /proc/sys/, including new line, etc.
      
      Documentation for the new helper is provided in bpf.h UAPI.
      
      Since current value is kept in ctl_table->data in a parsed form,
      ctl_table->proc_handler() with write=0 is called to read that data and
      convert it to a string. Such a string can later be parsed by a program
      using helpers that will be introduced separately.
      
      Unfortunately it's not trivial to provide API to access parsed data due to
      variety of data representations (string, intvec, uintvec, ulongvec,
      custom structures, even NULL, etc). Instead it's assumed that user know
      how to handle specific sysctl they're interested in and appropriate
      helpers can be used.
      
      Since ctl_table->proc_handler() expects __user buffer, conversion to
      __user happens for kernel allocated one where the value is stored.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      1d11b301