1. 09 9月, 2017 1 次提交
  2. 16 8月, 2017 2 次提交
    • N
      lib: Add zstd modules · 73f3d1b4
      Nick Terrell 提交于
      Add zstd compression and decompression kernel modules.
      zstd offers a wide varity of compression speed and quality trade-offs.
      It can compress at speeds approaching lz4, and quality approaching lzma.
      zstd decompressions at speeds more than twice as fast as zlib, and
      decompression speed remains roughly the same across all compression levels.
      
      The code was ported from the upstream zstd source repository. The
      `linux/zstd.h` header was modified to match linux kernel style.
      The cross-platform and allocation code was stripped out. Instead zstd
      requires the caller to pass a preallocated workspace. The source files
      were clang-formatted [1] to match the Linux Kernel style as much as
      possible. Otherwise, the code was unmodified. We would like to avoid
      as much further manual modification to the source code as possible, so it
      will be easier to keep the kernel zstd up to date.
      
      I benchmarked zstd compression as a special character device. I ran zstd
      and zlib compression at several levels, as well as performing no
      compression, which measure the time spent copying the data to kernel space.
      Data is passed to the compresser 4096 B at a time. The benchmark file is
      located in the upstream zstd source repository under
      `contrib/linux-kernel/zstd_compress_test.c` [2].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
      211,988,480 B large. Run the following commands for the benchmark:
      
          sudo modprobe zstd_compress_test
          sudo mknod zstd_compress_test c 245 0
          sudo cp silesia.tar zstd_compress_test
      
      The time is reported by the time of the userland `cp`.
      The MB/s is computed with
      
          1,536,217,008 B / time(buffer size, hash)
      
      which includes the time to copy from userland.
      The Adjusted MB/s is computed with
      
          1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
      
      The memory reported is the amount of memory the compressor requests.
      
      | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
      |----------|----------|----------|-------|---------|----------|----------|
      | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
      | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
      | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
      | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
      | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
      | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
      | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
      | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
      | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
      | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
      | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
      
      I benchmarked zstd decompression using the same method on the same machine.
      The benchmark file is located in the upstream zstd repo under
      `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
      the amount of memory required to decompress data compressed with the given
      compression level. If you know the maximum size of your input, you can
      reduce the memory usage of decompression irrespective of the compression
      level.
      
      | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
      |----------|----------|---------|---------------|-------------|
      | none     |    0.025 | 8479.54 |             - |           - |
      | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
      | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
      | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
      | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
      | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
      | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
      | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
      | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
      | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
      | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |
      
      Tested in userland using the test-suite in the zstd repo under
      `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
      functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
      `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
      with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
      BtrFS and SquashFS patches coming next.
      
      [1] https://clang.llvm.org/docs/ClangFormat.html
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
      [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
      [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
      [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
      [6] http://llvm.org/docs/LibFuzzer.html
      [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
      [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c
      
      zstd source repository: https://github.com/facebook/zstdSigned-off-by: NNick Terrell <terrelln@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      73f3d1b4
    • N
      lib: Add xxhash module · 5d240522
      Nick Terrell 提交于
      Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
      extremely fast non-cryptographic hash algorithm for checksumming.
      The zstd compression and decompression modules added in the next patch
      require xxhash. I extracted it out from zstd since it is useful on its
      own. I copied the code from the upstream XXHash source repository and
      translated it into kernel style. I ran benchmarks and tests in the kernel
      and tests in userland.
      
      I benchmarked xxhash as a special character device. I ran in four modes,
      no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
      kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
      hashes on the copied data. I also ran it with four different buffer sizes.
      The benchmark file is located in the upstream zstd source repository under
      `contrib/linux-kernel/xxhash_test.c` [1].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
      from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
      Run the following commands for the benchmark:
      
          modprobe xxhash_test
          mknod xxhash_test c 245 0
          time cp filesystem.squashfs xxhash_test
      
      The time is reported by the time of the userland `cp`.
      The GB/s is computed with
      
          1,536,217,008 B / time(buffer size, hash)
      
      which includes the time to copy from userland.
      The Normalized GB/s is computed with
      
          1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
      
      | Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
      |-----------------|-------|----------|------|---------------|
      |            1024 | none  |    0.408 | 3.77 |             - |
      |            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
      |            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
      |            1024 | crc32 |    1.290 | 1.19 |          1.74 |
      |            4096 | none  |    0.380 | 4.04 |             - |
      |            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
      |            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
      |            4096 | crc32 |    1.168 | 1.32 |          1.95 |
      |            8192 | none  |    0.351 | 4.38 |             - |
      |            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
      |            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
      |            8192 | crc32 |    1.163 | 1.32 |          1.89 |
      |           16384 | none  |    0.346 | 4.43 |             - |
      |           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
      |           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
      |           16384 | crc32 |    1.183 | 1.30 |          1.84 |
      
      Tested in userland using the test-suite in the zstd repo under
      `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
      kernel functions. A line in each branch of every function in `xxhash.c`
      was commented out to ensure that the test-suite fails. Additionally
      tested while testing zstd and with SMHasher [3].
      
      [1] https://phabricator.intern.facebook.com/P57526246
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
      [3] https://github.com/aappleby/smhasher
      
      zstd source repository: https://github.com/facebook/zstd
      XXHash source repository: https://github.com/cyan4973/xxhashSigned-off-by: NNick Terrell <terrelln@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5d240522
  3. 15 7月, 2017 1 次提交
    • L
      kmod: add test driver to stress test the module loader · d9c6a72d
      Luis R. Rodriguez 提交于
      This adds a new stress test driver for kmod: the kernel module loader.
      The new stress test driver, test_kmod, is only enabled as a module right
      now.  It should be possible to load this as built-in and load tests
      early (refer to the force_init_test module parameter), however since a
      lot of test can get a system out of memory fast we leave this disabled
      for now.
      
      Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
      fast with this test driver.
      
      The test_kmod driver exposes API knobs for us to fine tune simple
      request_module() and get_fs_type() calls.  Since these API calls only
      allow each one parameter a test driver for these is rather simple.
      Other factors that can help out test driver though are the number of
      calls we issue and knowing current limitations of each.  This exposes
      configuration as much as possible through userspace to be able to build
      tests directly from userspace.
      
      Since it allows multiple misc devices its will eventually (once we add a
      knob to let us create new devices at will) also be possible to perform
      more tests in parallel, provided you have enough memory.
      
      We only enable tests we know work as of right now.
      
      Demo screenshots:
      
       # tools/testing/selftests/kmod/kmod.sh
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0002_driver: OK! - loading kmod test
      kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0002_fs: OK! - loading kmod test
      kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      kmod_test_0003: OK! - loading kmod test
      kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0004: OK! - loading kmod test
      kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0005: OK! - loading kmod test
      kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      kmod_test_0006: OK! - loading kmod test
      kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
      XXX: add test restult for 0007
      Test completed
      
      You can also request for specific tests:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0001
      kmod_test_0001_driver: OK! - loading kmod test
      kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
      kmod_test_0001_fs: OK! - loading kmod test
      kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
      Test completed
      
      Lastly, the current available number of tests:
      
       # tools/testing/selftests/kmod/kmod.sh --help
      Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
      Valid tests: 0001-0009
      
      0001 - Simple test - 1 thread  for empty string
      0002 - Simple test - 1 thread  for modules/filesystems that do not exist
      0003 - Simple test - 1 thread  for get_fs_type() only
      0004 - Simple test - 2 threads for get_fs_type() only
      0005 - multithreaded tests with default setup - request_module() only
      0006 - multithreaded tests with default setup - get_fs_type() only
      0007 - multithreaded tests with default setup test request_module() and get_fs_type()
      0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
      0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
      
      The following test cases currently fail, as such they are not currently
      enabled by default:
      
       # tools/testing/selftests/kmod/kmod.sh -t 0008
       # tools/testing/selftests/kmod/kmod.sh -t 0009
      
      To be sure to run them as intended please unload both of the modules:
      
        o test_module
        o xfs
      
      And ensure they are not loaded on your system prior to testing them.  If
      you use these paritions for your rootfs you can change the default test
      driver used for get_fs_type() by exporting it into your environment.  For
      example of other test defaults you can override refer to kmod.sh
      allow_user_defaults().
      
      Behind the scenes this is how we fine tune at a test case prior to
      hitting a trigger to run it:
      
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
      echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
      echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      cat /sys/devices/virtual/misc/test_kmod0/config
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
      
      Finally to trigger:
      
      echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
      
      The kmod.sh script uses the above constructs to build different test cases.
      
      A bit of interpretation of the current failures follows, first two
      premises:
      
      a) When request_module() is used userspace figures out an optimized
         version of module order for us.  Once it finds the modules it needs, as
         per depmod symbol dep map, it will finit_module() the respective
         modules which are needed for the original request_module() request.
      
      b) We have an optimization in place whereby if a kernel uses
         request_module() on a module already loaded we never bother userspace
         as the module already is loaded.  This is all handled by kernel/kmod.c.
      
      A few things to consider to help identify root causes of issues:
      
      0) kmod 19 has a broken heuristic for modules being assumed to be
         built-in to your kernel and will return 0 even though request_module()
         failed.  Upgrade to a newer version of kmod.
      
      1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
         not for "xfs".  The optimization in kernel described in b) fails to
         catch if we have a lot of consecutive get_fs_type() calls.  The reason
         is the optimization in place does not look for aliases.  This means two
         consecutive get_fs_type() calls will bump kmod_concurrent, whereas
         request_module() will not.
      
      This one explanation why test case 0009 fails at least once for
      get_fs_type().
      
      2) If a module fails to load --- for whatever reason (kmod_concurrent
         limit reached, file not yet present due to rootfs switch, out of
         memory) we have a period of time during which module request for the
         same name either with request_module() or get_fs_type() will *also*
         fail to load even if the file for the module is ready.
      
      This explains why *multiple* NULLs are possible on test 0009.
      
      3) finit_module() consumes quite a bit of memory.
      
      4) Filesystems typically also have more dependent modules than other
         modules, its important to note though that even though a get_fs_type()
         call does not incur additional kmod_concurrent bumps, since userspace
         loads dependencies it finds it needs via finit_module_fd(), it *will*
         take much more memory to load a module with a lot of dependencies.
      
      Because of 3) and 4) we will easily run into out of memory failures with
      certain tests.  For instance test 0006 fails on qemu with 1024 MiB of RAM.
      It panics a box after reaping all userspace processes and still not
      having enough memory to reap.
      
      [arnd@arndb.de: add dependencies for test module]
        Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9c6a72d
  4. 13 7月, 2017 1 次提交
  5. 06 7月, 2017 1 次提交
    • J
      lib: add errseq_t type and infrastructure for handling it · 84cbadad
      Jeff Layton 提交于
      An errseq_t is a way of recording errors in one place, and allowing any
      number of "subscribers" to tell whether an error has been set again
      since a previous time.
      
      It's implemented as an unsigned 32-bit value that is managed with atomic
      operations. The low order bits are designated to hold an error code
      (max size of MAX_ERRNO). The upper bits are used as a counter.
      
      The API works with consumers sampling an errseq_t value at a particular
      point in time. Later, that value can be used to tell whether new errors
      have been set since that time.
      
      Note that there is a 1 in 512k risk of collisions here if new errors
      are being recorded frequently, since we have so few bits to use as a
      counter. To mitigate this, one bit is used as a flag to tell whether the
      value has been sampled since a new value was recorded. That allows
      us to avoid bumping the counter if no one has sampled it since it
      was last bumped.
      
      Later patches will build on this infrastructure to change how writeback
      errors are tracked in the kernel.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      84cbadad
  6. 09 6月, 2017 2 次提交
  7. 09 5月, 2017 1 次提交
  8. 27 4月, 2017 1 次提交
  9. 29 3月, 2017 1 次提交
  10. 24 3月, 2017 1 次提交
  11. 25 2月, 2017 3 次提交
  12. 24 2月, 2017 1 次提交
  13. 14 2月, 2017 2 次提交
  14. 04 2月, 2017 1 次提交
    • J
      lib: Introduce priority array area manager · 44091d29
      Jiri Pirko 提交于
      This introduces a infrastructure for management of linear priority
      areas. Priority order in an array matters, however order of items inside
      a priority group does not matter.
      
      As an initial implementation, L-sort algorithm is used. It is quite
      trivial. More advanced algorithm called P-sort will be introduced as a
      follow-up. The infrastructure is prepared for other algos.
      
      Alongside this, a testing module is introduced as well.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44091d29
  15. 03 2月, 2017 1 次提交
    • J
      ext4: move halfmd4 into hash.c directly · 1c83a9aa
      Jason A. Donenfeld 提交于
      The "half md4" transform should not be used by any new code. And
      fortunately, it's only used now by ext4. Since ext4 supports several
      hashing methods, at some point it might be desirable to move to
      something like SipHash. As an intermediate step, remove half md4 from
      cryptohash.h and lib, and make it just a local function in ext4's
      hash.c. There's precedent for doing this; the other function ext can use
      for its hashes -- TEA -- is also implemented in the same place. Also, by
      being a local function, this might allow gcc to perform some additional
      optimizations.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1c83a9aa
  16. 25 1月, 2017 2 次提交
  17. 10 1月, 2017 1 次提交
    • J
      siphash: add cryptographically secure PRF · 2c956a60
      Jason A. Donenfeld 提交于
      SipHash is a 64-bit keyed hash function that is actually a
      cryptographically secure PRF, like HMAC. Except SipHash is super fast,
      and is meant to be used as a hashtable keyed lookup function, or as a
      general PRF for short input use cases, such as sequence numbers or RNG
      chaining.
      
      For the first usage:
      
      There are a variety of attacks known as "hashtable poisoning" in which an
      attacker forms some data such that the hash of that data will be the
      same, and then preceeds to fill up all entries of a hashbucket. This is
      a realistic and well-known denial-of-service vector. Currently
      hashtables use jhash, which is fast but not secure, and some kind of
      rotating key scheme (or none at all, which isn't good). SipHash is meant
      as a replacement for jhash in these cases.
      
      There are a modicum of places in the kernel that are vulnerable to
      hashtable poisoning attacks, either via userspace vectors or network
      vectors, and there's not a reliable mechanism inside the kernel at the
      moment to fix it. The first step toward fixing these issues is actually
      getting a secure primitive into the kernel for developers to use. Then
      we can, bit by bit, port things over to it as deemed appropriate.
      
      While SipHash is extremely fast for a cryptographically secure function,
      it is likely a bit slower than the insecure jhash, and so replacements
      will be evaluated on a case-by-case basis based on whether or not the
      difference in speed is negligible and whether or not the current jhash usage
      poses a real security risk.
      
      For the second usage:
      
      A few places in the kernel are using MD5 or SHA1 for creating secure
      sequence numbers, syn cookies, port numbers, or fast random numbers.
      SipHash is a faster and more fitting, and more secure replacement for MD5
      in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
      obvious and straight-forward, and so is submitted along with this patch
      series. There shouldn't be much of a debate over its efficacy.
      
      Dozens of languages are already using this internally for their hash
      tables and PRFs. Some of the BSDs already use this in their kernels.
      SipHash is a widely known high-speed solution to a widely known set of
      problems, and it's time we catch-up.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c956a60
  18. 27 12月, 2016 1 次提交
  19. 25 12月, 2016 1 次提交
  20. 12 10月, 2016 1 次提交
  21. 21 9月, 2016 1 次提交
  22. 17 9月, 2016 1 次提交
  23. 31 8月, 2016 1 次提交
    • J
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf 提交于
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
  24. 03 7月, 2016 1 次提交
  25. 08 6月, 2016 1 次提交
  26. 31 5月, 2016 1 次提交
  27. 29 5月, 2016 1 次提交
    • G
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin 提交于
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
  28. 20 5月, 2016 1 次提交
  29. 16 4月, 2016 1 次提交
  30. 01 4月, 2016 1 次提交
  31. 26 3月, 2016 1 次提交
    • A
      mm, kasan: stackdepot implementation. Enable stackdepot for SLAB · cd11016e
      Alexander Potapenko 提交于
      Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
      will allow KASAN store allocation/deallocation stack traces for memory
      chunks.  The stack traces are stored in a hash table and referenced by
      handles which reside in the kasan_alloc_meta and kasan_free_meta
      structures in the allocated memory chunks.
      
      IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
      duplication.
      
      Right now stackdepot support is only enabled in SLAB allocator.  Once
      KASAN features in SLAB are on par with those in SLUB we can switch SLUB
      to stackdepot as well, thus removing the dependency on SLUB stack
      bookkeeping, which wastes a lot of memory.
      
      This patch is based on the "mm: kasan: stack depots" patch originally
      prepared by Dmitry Chernenkov.
      
      Joonsoo has said that he plans to reuse the stackdepot code for the
      mm/page_owner.c debugging facility.
      
      [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
      [aryabinin@virtuozzo.com: comment style fixes]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd11016e
  32. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  33. 02 3月, 2016 1 次提交
  34. 20 2月, 2016 1 次提交