1. 11 1月, 2018 1 次提交
    • D
      bpf: arsh is not supported in 32 bit alu thus reject it · 7891a87e
      Daniel Borkmann 提交于
      The following snippet was throwing an 'unknown opcode cc' warning
      in BPF interpreter:
      
        0: (18) r0 = 0x0
        2: (7b) *(u64 *)(r10 -16) = r0
        3: (cc) (u32) r0 s>>= (u32) r0
        4: (95) exit
      
      Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
      generation, not all of them do and interpreter does neither. We can
      leave existing ones and implement it later in bpf-next for the
      remaining ones, but reject this properly in verifier for the time
      being.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7891a87e
  2. 07 1月, 2018 1 次提交
  3. 24 12月, 2017 1 次提交
    • T
      x86/ldt: Make the LDT mapping RO · 9f5cb6b3
      Thomas Gleixner 提交于
      Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is
      enabled its a primary target for attacks, if a user space interface fails
      to validate a write address correctly. That can never happen, right?
      
      The SDM states:
      
          If the segment descriptors in the GDT or an LDT are placed in ROM, the
          processor can enter an indefinite loop if software or the processor
          attempts to update (write to) the ROM-based segment descriptors. To
          prevent this problem, set the accessed bits for all segment descriptors
          placed in a ROM. Also, remove operating-system or executive code that
          attempts to modify segment descriptors located in ROM.
      
      So its a valid approach to set the ACCESS bit when setting up the LDT entry
      and to map the table RO. Fixup the selftest so it can handle that new mode.
      
      Remove the manual ACCESS bit setter in set_tls_desc() as this is now
      pointless. Folded the patch from Peter Ziljstra.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9f5cb6b3
  4. 23 12月, 2017 1 次提交
    • T
      x86/ldt: Prevent LDT inheritance on exec · a4828f81
      Thomas Gleixner 提交于
      The LDT is inherited across fork() or exec(), but that makes no sense
      at all because exec() is supposed to start the process clean.
      
      The reason why this happens is that init_new_context_ldt() is called from
      init_new_context() which obviously needs to be called for both fork() and
      exec().
      
      It would be surprising if anything relies on that behaviour, so it seems to
      be safe to remove that misfeature.
      
      Split the context initialization into two parts. Clear the LDT pointer and
      initialize the mutex from the general context init and move the LDT
      duplication to arch_dup_mmap() which is only called on fork().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a4828f81
  5. 22 12月, 2017 1 次提交
  6. 21 12月, 2017 3 次提交
    • A
      bpf: do not allow root to mangle valid pointers · 82abbf8d
      Alexei Starovoitov 提交于
      Do not allow root to convert valid pointers into unknown scalars.
      In particular disallow:
       ptr &= reg
       ptr <<= reg
       ptr += ptr
      and explicitly allow:
       ptr -= ptr
      since pkt_end - pkt == length
      
      1.
      This minimizes amount of address leaks root can do.
      In the future may need to further tighten the leaks with kptr_restrict.
      
      2.
      If program has such pointer math it's likely a user mistake and
      when verifier complains about it right away instead of many instructions
      later on invalid memory access it's easier for users to fix their progs.
      
      3.
      when register holding a pointer cannot change to scalar it allows JITs to
      optimize better. Like 32-bit archs could use single register for pointers
      instead of a pair required to hold 64-bit scalars.
      
      4.
      reduces architecture dependent behavior. Since code:
      r1 = r10;
      r1 &= 0xff;
      if (r1 ...)
      will behave differently arm64 vs x64 and offloaded vs native.
      
      A significant chunk of ptr mangling was allowed by
      commit f1174f77 ("bpf/verifier: rework value tracking")
      yet some of it was allowed even earlier.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      82abbf8d
    • J
      selftests/bpf: add tests for recent bugfixes · 2255f8d5
      Jann Horn 提交于
      These tests should cover the following cases:
      
       - MOV with both zero-extended and sign-extended immediates
       - implicit truncation of register contents via ALU32/MOV32
       - implicit 32-bit truncation of ALU32 output
       - oversized register source operand for ALU32 shift
       - right-shift of a number that could be positive or negative
       - map access where adding the operation size to the offset causes signed
         32-bit overflow
       - direct stack access at a ~4GiB offset
      
      Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
      that should fail independent of what flags userspace passes.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2255f8d5
    • N
      selftests: net: Adding config fragment CONFIG_NUMA=y · 1c8e77fb
      Naresh Kamboju 提交于
      kernel config fragement CONFIG_NUMA=y is need for reuseport_bpf_numa.
      Signed-off-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c8e77fb
  7. 20 12月, 2017 1 次提交
  8. 16 12月, 2017 1 次提交
  9. 13 12月, 2017 1 次提交
    • D
      bpf: fix broken BPF selftest build · 720f228e
      Daniel Borkmann 提交于
      At least on x86_64, the kernel's BPF selftests seemed to have stopped
      to build due to 618e165b ("selftests/bpf: sync kernel headers and
      introduce arch support in Makefile"):
      
        [...]
        In file included from test_verifier.c:29:0:
        ../../../include/uapi/linux/bpf_perf_event.h:11:32:
           fatal error: asm/bpf_perf_event.h: No such file or directory
         #include <asm/bpf_perf_event.h>
                                      ^
        compilation terminated.
        [...]
      
      While pulling in tools/arch/*/include/uapi/asm/bpf_perf_event.h seems
      to work fine, there's no automated fall-back logic right now that would
      do the same out of tools/include/uapi/asm-generic/bpf_perf_event.h. The
      usual convention today is to add a include/[uapi/]asm/ equivalent that
      would pull in the correct arch header or generic one as fall-back, all
      ifdef'ed based on compiler target definition. It's similarly done also
      in other cases such as tools/include/asm/barrier.h, thus adapt the same
      here.
      
      Fixes: 618e165b ("selftests/bpf: sync kernel headers and introduce arch support in Makefile")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      720f228e
  10. 05 12月, 2017 1 次提交
  11. 01 12月, 2017 1 次提交
  12. 23 11月, 2017 1 次提交
    • G
      bpf: introduce ARG_PTR_TO_MEM_OR_NULL · db1ac496
      Gianluca Borello 提交于
      With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
      argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
      and the verifier can prove the value of this next argument is 0. However,
      most helpers are just interested in handling <!NULL, 0>, so forcing them to
      deal with <NULL, 0> makes the implementation of those helpers more
      complicated for no apparent benefits, requiring them to explicitly handle
      those corner cases with checks that bpf programs could start relying upon,
      preventing the possibility of removing them later.
      
      Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
      even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
      ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.
      
      Currently, the only helper that needs this is bpf_csum_diff_proto(), so
      change arg1 and arg3 to this new type as well.
      
      Also add a new battery of tests that explicitly test the
      !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
      various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
      other helpers.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db1ac496
  13. 22 11月, 2017 1 次提交
    • Y
      bpf: change bpf_probe_write_user to bpf_trace_printk in test_verifier · f1a8b8e3
      Yonghong Song 提交于
      There are four tests in test_verifier using bpf_probe_write_user
      helper. These four tests will emit the following kernel messages
        [   12.974753] test_verifier[220] is installing a program with bpf_probe_write_user
                                          helper that may corrupt user memory!
        [   12.979285] test_verifier[220] is installing a program with bpf_probe_write_user
                                          helper that may corrupt user memory!
        ......
      
      This may confuse certain users. This patch replaces bpf_probe_write_user
      with bpf_trace_printk. The test_verifier already uses bpf_trace_printk
      earlier in the test and a trace_printk warning message has been printed.
      So this patch does not emit any more kernel messages.
      
      Fixes: b6ff6391 ("bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f1a8b8e3
  14. 21 11月, 2017 3 次提交
  15. 18 11月, 2017 1 次提交
  16. 16 11月, 2017 2 次提交
    • K
      x86/selftests: Add test for mapping placement for 5-level paging · 97f404ad
      Kirill A. Shutemov 提交于
      5-level paging provides a 56-bit virtual address space for user space
      application. But the kernel defaults to mappings below the 47-bit address
      space boundary, which is the upper bound for 4-level paging, unless an
      application explicitely request it by using a mmap(2) address hint above
      the 47-bit boundary. The kernel prevents mappings which spawn across the
      47-bit boundary unless mmap(2) was invoked with MAP_FIXED.
      
      Add a self-test that covers the corner cases of the interface and validates
      the correctness of the implementation.
      
      [ tglx: Massaged changelog once more ]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com
      97f404ad
    • M
      mm, truncate: do not check mapping for every page being truncated · c7df8ad2
      Mel Gorman 提交于
      During truncation, the mapping has already been checked for shmem and
      dax so it's known that workingset_update_node is required.
      
      This patch avoids the checks on mapping for each page being truncated.
      In all other cases, a lookup helper is used to determine if
      workingset_update_node() needs to be called.  The one danger is that the
      API is slightly harder to use as calling workingset_update_node directly
      without checking for dax or shmem mappings could lead to surprises.
      However, the API rarely needs to be used and hopefully the comment is
      enough to give people the hint.
      
      sparsetruncate (tiny)
                                    4.14.0-rc4             4.14.0-rc4
                                   oneirq-v1r1        pickhelper-v1r1
      Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
      1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
      2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
      3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
      Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
      Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
      Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
      Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
      Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
      Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
      Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
      Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
      Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
      Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
      Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
      Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
      Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)
      
      As you'd expect, the gain is marginal but it can be detected.  The
      differences in bonnie are all within the noise which is not surprising
      given the impact on the microbenchmark.
      
      radix_tree_update_node_t is a callback for some radix operations that
      optionally passes in a private field.  The only user of the callback is
      workingset_update_node and as it no longer requires a mapping, the
      private field is removed.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7df8ad2
  17. 15 11月, 2017 15 次提交
  18. 14 11月, 2017 2 次提交
  19. 11 11月, 2017 2 次提交