1. 27 7月, 2017 1 次提交
    • J
      bpf: don't zero out the info struct in bpf_obj_get_info_by_fd() · d777b2dd
      Jakub Kicinski 提交于
      The buffer passed to bpf_obj_get_info_by_fd() should be initialized
      to zeros.  Kernel will enforce that to guarantee we can safely extend
      info structures in the future.
      
      Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
      is problematic, however, since some members of the info structures
      may need to be initialized by the callers (for instance pointers
      to buffers to which kernel is to dump translated and jited images).
      
      Remove the zeroing and fix up the in-tree callers before any kernel
      has been released with this code.
      
      As Daniel points out this seems to be the intended operation anyway,
      since commit 95b9afd3 ("bpf: Test for bpf ID") is itself setting
      the buffer pointers before calling bpf_obj_get_info_by_fd().
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d777b2dd
  2. 25 7月, 2017 1 次提交
  3. 21 7月, 2017 4 次提交
  4. 15 7月, 2017 2 次提交
    • L
      kmod: throttle kmod thread limit · 6d7964a7
      Luis R. Rodriguez 提交于
      If we reach the limit of modprobe_limit threads running the next
      request_module() call will fail.  The original reason for adding a kill
      was to do away with possible issues with in old circumstances which would
      create a recursive series of request_module() calls.
      
      We can do better than just be super aggressive and reject calls once we've
      reached the limit by simply making pending callers wait until the
      threshold has been reduced, and then throttling them in, one by one.
      
      This throttling enables requests over the kmod concurrent limit to be
      processed once a pending request completes.  Only the first item queued up
      to wait is woken up.  The assumption here is once a task is woken it will
      have no other option to also kick the queue to check if there are more
      pending tasks -- regardless of whether or not it was successful.
      
      By throttling and processing only max kmod concurrent tasks we ensure we
      avoid unexpected fatal request_module() calls, and we keep memory
      consumption on module loading to a minimum.
      
      With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
      time to run both tests:
      
      time ./kmod.sh -t 0008
      real    0m16.366s
      user    0m0.883s
      sys     0m8.916s
      
      time ./kmod.sh -t 0009
      real    0m50.803s
      user    0m0.791s
      sys     0m9.852s
      
      Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7964a7
    • L
      kmod: add test driver to stress test the module loader · d9c6a72d
      Luis R. Rodriguez 提交于
      This adds a new stress test driver for kmod: the kernel module loader.
      The new stress test driver, test_kmod, is only enabled as a module right
      now.  It should be possible to load this as built-in and load tests
      early (refer to the force_init_test module parameter), however since a
      lot of test can get a system out of memory fast we leave this disabled
      for now.
      
      Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
      fast with this test driver.
      
      The test_kmod driver exposes API knobs for us to fine tune simple
      request_module() and get_fs_type() calls.  Since these API calls only
      allow each one parameter a test driver for these is rather simple.
      Other factors that can help out test driver though are the number of
      calls we issue and knowing current limitations of each.  This exposes
      configuration as much as possible through userspace to be able to build
      tests directly from userspace.
      
      Since it allows multiple misc devices its will eventually (once we add a
      knob to let us create new devices at will) also be possible to perform
      more tests in parallel, provided you have enough memory.
      
      We only enable tests we know work as of right now.
      
      Demo screenshots:
      
       # tools/testing/selftests/kmod/kmod.sh
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0002_driver: OK! - loading kmod test
      kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0002_fs: OK! - loading kmod test
      kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0003: OK! - loading kmod test
      kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0004: OK! - loading kmod test
      kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      XXX: add test restult for 0007
      Test completed
      
      You can also request for specific tests:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0001
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      Test completed
      
      Lastly, the current available number of tests:
      
       # tools/testing/selftests/kmod/kmod.sh --help
      Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
      Valid tests: 0001-0009
      
      0001 - Simple test - 1 thread  for empty string
      0002 - Simple test - 1 thread  for modules/filesystems that do not exist
      0003 - Simple test - 1 thread  for get_fs_type() only
      0004 - Simple test - 2 threads for get_fs_type() only
      0005 - multithreaded tests with default setup - request_module() only
      0006 - multithreaded tests with default setup - get_fs_type() only
      0007 - multithreaded tests with default setup test request_module() and get_fs_type()
      0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
      0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
      
      The following test cases currently fail, as such they are not currently
      enabled by default:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0008
       # tools/testing/selftests/kmod/kmod.sh -t 0009
      
      To be sure to run them as intended please unload both of the modules:
      
        o test_module
        o xfs
      
      And ensure they are not loaded on your system prior to testing them.  If
      you use these paritions for your rootfs you can change the default test
      driver used for get_fs_type() by exporting it into your environment.  For
      example of other test defaults you can override refer to kmod.sh
      allow_user_defaults().
      
      Behind the scenes this is how we fine tune at a test case prior to
      hitting a trigger to run it:
      
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
      echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
      echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      
      Finally to trigger:
      
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
      
      The kmod.sh script uses the above constructs to build different test cases.
      
      A bit of interpretation of the current failures follows, first two
      premises:
      
      a) When request_module() is used userspace figures out an optimized
         version of module order for us.  Once it finds the modules it needs, as
         per depmod symbol dep map, it will finit_module() the respective
         modules which are needed for the original request_module() request.
      
      b) We have an optimization in place whereby if a kernel uses
         request_module() on a module already loaded we never bother userspace
         as the module already is loaded.  This is all handled by kernel/kmod.c.
      
      A few things to consider to help identify root causes of issues:
      
      0) kmod 19 has a broken heuristic for modules being assumed to be
         built-in to your kernel and will return 0 even though request_module()
         failed.  Upgrade to a newer version of kmod.
      
      1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
         not for "xfs".  The optimization in kernel described in b) fails to
         catch if we have a lot of consecutive get_fs_type() calls.  The reason
         is the optimization in place does not look for aliases.  This means two
         consecutive get_fs_type() calls will bump kmod_concurrent, whereas
         request_module() will not.
      
      This one explanation why test case 0009 fails at least once for
      get_fs_type().
      
      2) If a module fails to load --- for whatever reason (kmod_concurrent
         limit reached, file not yet present due to rootfs switch, out of
         memory) we have a period of time during which module request for the
         same name either with request_module() or get_fs_type() will *also*
         fail to load even if the file for the module is ready.
      
      This explains why *multiple* NULLs are possible on test 0009.
      
      3) finit_module() consumes quite a bit of memory.
      
      4) Filesystems typically also have more dependent modules than other
         modules, its important to note though that even though a get_fs_type()
         call does not incur additional kmod_concurrent bumps, since userspace
         loads dependencies it finds it needs via finit_module_fd(), it *will*
         take much more memory to load a module with a lot of dependencies.
      
      Because of 3) and 4) we will easily run into out of memory failures with
      certain tests.  For instance test 0006 fails on qemu with 1024 MiB of RAM.
      It panics a box after reaping all userspace processes and still not
      having enough memory to reap.
      
      [arnd@arndb.de: add dependencies for test module]
        Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9c6a72d
  5. 13 7月, 2017 6 次提交
  6. 12 7月, 2017 1 次提交
    • Y
      samples/bpf: fix a build issue · 53335022
      Yonghong Song 提交于
      With latest net-next:
      
      ====
      clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
          -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
          -Wno-compare-distinct-pointer-types \
          -Wno-gnu-variable-sized-type-not-at-end \
          -Wno-address-of-packed-member -Wno-tautological-compare \
          -Wno-unknown-warning-option \
          -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
      samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
                ^~~~~~~~~~~~~~
      1 error generated.
      ====
      
      net has the same issue.
      
      Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
      Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
      compiler include logic so that programs in samples/bpf can access the headers
      in selftests/bpf, but not the other way around.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53335022
  7. 10 7月, 2017 2 次提交
  8. 09 7月, 2017 3 次提交
  9. 06 7月, 2017 2 次提交
  10. 03 7月, 2017 1 次提交
  11. 01 7月, 2017 14 次提交
  12. 30 6月, 2017 1 次提交
    • D
      bpf: prevent leaking pointer via xadd on unpriviledged · 6bdf6abc
      Daniel Borkmann 提交于
      Leaking kernel addresses on unpriviledged is generally disallowed,
      for example, verifier rejects the following:
      
        0: (b7) r0 = 0
        1: (18) r2 = 0xffff897e82304400
        3: (7b) *(u64 *)(r1 +48) = r2
        R2 leaks addr into ctx
      
      Doing pointer arithmetic on them is also forbidden, so that they
      don't turn into unknown value and then get leaked out. However,
      there's xadd as a special case, where we don't check the src reg
      for being a pointer register, e.g. the following will pass:
      
        0: (b7) r0 = 0
        1: (7b) *(u64 *)(r1 +48) = r0
        2: (18) r2 = 0xffff897e82304400 ; map
        4: (db) lock *(u64 *)(r1 +48) += r2
        5: (95) exit
      
      We could store the pointer into skb->cb, loose the type context,
      and then read it out from there again to leak it eventually out
      of a map value. Or more easily in a different variant, too:
      
         0: (bf) r6 = r1
         1: (7a) *(u64 *)(r10 -8) = 0
         2: (bf) r2 = r10
         3: (07) r2 += -8
         4: (18) r1 = 0x0
         6: (85) call bpf_map_lookup_elem#1
         7: (15) if r0 == 0x0 goto pc+3
         R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
         8: (b7) r3 = 0
         9: (7b) *(u64 *)(r0 +0) = r3
        10: (db) lock *(u64 *)(r0 +0) += r6
        11: (b7) r0 = 0
        12: (95) exit
      
        from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
        11: (b7) r0 = 0
        12: (95) exit
      
      Prevent this by checking xadd src reg for pointer types. Also
      add a couple of test cases related to this.
      
      Fixes: 1be7f75d ("bpf: enable non-root eBPF programs")
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdf6abc
  13. 28 6月, 2017 1 次提交
  14. 27 6月, 2017 1 次提交
    • K
      seccomp: Adjust selftests to avoid double-join · 93bd70e3
      Kees Cook 提交于
      While glibc's pthread implementation is rather forgiving about repeat
      thread joining, Bionic has recently become much more strict. To deal with
      this, actually track which threads have been successfully joined and kill
      the rest at teardown.
      
      Based on a patch from Paul Lawrence.
      
      Cc: Paul Lawrence <paullawrence@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      93bd70e3