1. 25 2月, 2017 2 次提交
  2. 24 2月, 2017 1 次提交
  3. 10 2月, 2017 2 次提交
    • K
      time: Remove CONFIG_TIMER_STATS · dfb4357d
      Kees Cook 提交于
      Currently CONFIG_TIMER_STATS exposes process information across namespaces:
      
      kernel/time/timer_list.c print_timer():
      
              SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
      
      /proc/timer_list:
      
       #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
      
      Given that the tracer can give the same information, this patch entirely
      removes CONFIG_TIMER_STATS.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: linux-doc@vger.kernel.org
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Xing Gao <xgao01@email.wm.edu>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jessica Frazelle <me@jessfraz.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170208192659.GA32582@beastSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dfb4357d
    • P
      refcount_t: Introduce a special purpose refcount type · f405df5d
      Peter Zijlstra 提交于
      Provide refcount_t, an atomic_t like primitive built just for
      refcounting.
      
      It provides saturation semantics such that overflow becomes impossible
      and thereby 'spurious' use-after-free is avoided.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f405df5d
  4. 04 2月, 2017 1 次提交
    • J
      lib: Introduce priority array area manager · 44091d29
      Jiri Pirko 提交于
      This introduces a infrastructure for management of linear priority
      areas. Priority order in an array matters, however order of items inside
      a priority group does not matter.
      
      As an initial implementation, L-sort algorithm is used. It is quite
      trivial. More advanced algorithm called P-sort will be introduced as a
      follow-up. The infrastructure is prepared for other algos.
      
      Alongside this, a testing module is introduced as well.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44091d29
  5. 24 1月, 2017 1 次提交
    • M
      rcu: Enable RCU tracepoints by default to aid in debugging · 96151825
      Matt Fleming 提交于
      While debugging a performance issue I needed to understand why
      RCU sofitrqs were firing so frequently.
      
      Unfortunately, the RCU callback tracepoints are hidden behind
      CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is
      likely to also be disabled in enterprise distribution configs.
      
      Enable it by default for CONFIG_TREE_RCU. However, we must keep it
      disabled for tiny RCU, because it would otherwise pull in a large
      amount of code that would make tiny RCU less than tiny.
      
      I ran some file system metadata intensive workloads (git checkout,
      FS-Mark) on a variety of machines with this patch and saw no
      detectable change in performance.
      
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      96151825
  6. 14 1月, 2017 1 次提交
  7. 12 1月, 2017 1 次提交
  8. 11 1月, 2017 2 次提交
  9. 10 1月, 2017 1 次提交
    • J
      siphash: add cryptographically secure PRF · 2c956a60
      Jason A. Donenfeld 提交于
      SipHash is a 64-bit keyed hash function that is actually a
      cryptographically secure PRF, like HMAC. Except SipHash is super fast,
      and is meant to be used as a hashtable keyed lookup function, or as a
      general PRF for short input use cases, such as sequence numbers or RNG
      chaining.
      
      For the first usage:
      
      There are a variety of attacks known as "hashtable poisoning" in which an
      attacker forms some data such that the hash of that data will be the
      same, and then preceeds to fill up all entries of a hashbucket. This is
      a realistic and well-known denial-of-service vector. Currently
      hashtables use jhash, which is fast but not secure, and some kind of
      rotating key scheme (or none at all, which isn't good). SipHash is meant
      as a replacement for jhash in these cases.
      
      There are a modicum of places in the kernel that are vulnerable to
      hashtable poisoning attacks, either via userspace vectors or network
      vectors, and there's not a reliable mechanism inside the kernel at the
      moment to fix it. The first step toward fixing these issues is actually
      getting a secure primitive into the kernel for developers to use. Then
      we can, bit by bit, port things over to it as deemed appropriate.
      
      While SipHash is extremely fast for a cryptographically secure function,
      it is likely a bit slower than the insecure jhash, and so replacements
      will be evaluated on a case-by-case basis based on whether or not the
      difference in speed is negligible and whether or not the current jhash usage
      poses a real security risk.
      
      For the second usage:
      
      A few places in the kernel are using MD5 or SHA1 for creating secure
      sequence numbers, syn cookies, port numbers, or fast random numbers.
      SipHash is a faster and more fitting, and more secure replacement for MD5
      in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
      obvious and straight-forward, and so is submitted along with this patch
      series. There shouldn't be much of a debate over its efficacy.
      
      Dozens of languages are already using this internally for their hash
      tables and PRFs. Some of the BSDs already use this in their kernels.
      SipHash is a widely known high-speed solution to a widely known set of
      problems, and it's time we catch-up.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c956a60
  10. 25 12月, 2016 1 次提交
  11. 21 12月, 2016 1 次提交
  12. 15 12月, 2016 1 次提交
  13. 13 12月, 2016 2 次提交
  14. 19 11月, 2016 1 次提交
  15. 15 11月, 2016 1 次提交
  16. 01 11月, 2016 3 次提交
  17. 28 10月, 2016 1 次提交
  18. 24 10月, 2016 1 次提交
  19. 10 10月, 2016 1 次提交
  20. 29 9月, 2016 1 次提交
  21. 23 9月, 2016 1 次提交
  22. 31 8月, 2016 1 次提交
    • J
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf 提交于
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
  23. 03 8月, 2016 1 次提交
  24. 27 7月, 2016 2 次提交
    • J
      mm/page_owner: use stackdepot to store stacktrace · f2ca0b55
      Joonsoo Kim 提交于
      Currently, we store each page's allocation stacktrace on corresponding
      page_ext structure and it requires a lot of memory.  This causes the
      problem that memory tight system doesn't work well if page_owner is
      enabled.  Moreover, even with this large memory consumption, we cannot
      get full stacktrace because we allocate memory at boot time and just
      maintain 8 stacktrace slots to balance memory consumption.  We could
      increase it to more but it would make system unusable or change system
      behaviour.
      
      To solve the problem, this patch uses stackdepot to store stacktrace.
      It obviously provides memory saving but there is a drawback that
      stackdepot could fail.
      
      stackdepot allocates memory at runtime so it could fail if system has
      not enough memory.  But, most of allocation stack are generated at very
      early time and there are much memory at this time.  So, failure would
      not happen easily.  And, one failure means that we miss just one page's
      allocation stacktrace so it would not be a big problem.  In this patch,
      when memory allocation failure happens, we store special stracktrace
      handle to the page that is failed to save stacktrace.  With it, user can
      guess memory usage properly even if failure happens.
      
      Memory saving looks as following.  (4GB memory system with page_owner)
      (before the patch -> after the patch)
      
      static allocation:
      92274688 bytes -> 25165824 bytes
      
      dynamic allocation after boot + kernel build:
      0 bytes -> 327680 bytes
      
      total:
      92274688 bytes -> 25493504 bytes
      
      72% reduction in total.
      
      Note that implementation looks complex than someone would imagine
      because there is recursion issue.  stackdepot uses page allocator and
      page_owner is called at page allocation.  Using stackdepot in page_owner
      could re-call page allcator and then page_owner.  That is a recursion.
      To detect and avoid it, whenever we obtain stacktrace, recursion is
      checked and page_owner is set to dummy information if found.  Dummy
      information means that this page is allocated for page_owner feature
      itself (such as stackdepot) and it's understandable behavior for user.
      
      [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
        Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
        Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
      Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2ca0b55
    • K
      gcc-plugins: disable under COMPILE_TEST · a519167e
      Kees Cook 提交于
      Since adding the gcc plugin development headers is required for the
      gcc plugin support, we should ease into this new kernel build dependency
      more slowly. For now, disable the gcc plugins under COMPILE_TEST so that
      all*config builds will skip it.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      a519167e
  25. 15 6月, 2016 2 次提交
  26. 08 6月, 2016 1 次提交
  27. 31 5月, 2016 1 次提交
  28. 29 5月, 2016 1 次提交
    • G
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin 提交于
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
  29. 13 4月, 2016 1 次提交
    • N
      debugfs: prevent access to possibly dead file_operations at file open · 9fd4dcec
      Nicolai Stange 提交于
      Nothing prevents a dentry found by path lookup before a return of
      __debugfs_remove() to actually get opened after that return. Now, after
      the return of __debugfs_remove(), there are no guarantees whatsoever
      regarding the memory the corresponding inode's file_operations object
      had been kept in.
      
      Since __debugfs_remove() is seldomly invoked, usually from module exit
      handlers only, the race is hard to trigger and the impact is very low.
      
      A discussion of the problem outlined above as well as a suggested
      solution can be found in the (sub-)thread rooted at
      
        http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
        ("Yet another pipe related oops.")
      
      Basically, Greg KH suggests to introduce an intermediate fops and
      Al Viro points out that a pointer to the original ones may be stored in
      ->d_fsdata.
      
      Follow this line of reasoning:
      - Add SRCU as a reverse dependency of DEBUG_FS.
      - Introduce a srcu_struct object for the debugfs subsystem.
      - In debugfs_create_file(), store a pointer to the original
        file_operations object in ->d_fsdata.
      - Make debugfs_remove() and debugfs_remove_recursive() wait for a
        SRCU grace period after the dentry has been delete()'d and before they
        return to their callers.
      - Introduce an intermediate file_operations object named
        "debugfs_open_proxy_file_operations". It's ->open() functions checks,
        under the protection of a SRCU read lock, whether the dentry is still
        alive, i.e. has not been d_delete()'d and if so, tries to acquire a
        reference on the owning module.
        On success, it sets the file object's ->f_op to the original
        file_operations and forwards the ongoing open() call to the original
        ->open().
      - For clarity, rename the former debugfs_file_operations to
        debugfs_noop_file_operations -- they are in no way canonical.
      
      The choice of SRCU over "normal" RCU is justified by the fact, that the
      former may also be used to protect ->i_private data from going away
      during the execution of a file's readers and writers which may (and do)
      sleep.
      
      Finally, introduce the fs/debugfs/internal.h header containing some
      declarations internal to the debugfs implementation.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fd4dcec
  30. 01 4月, 2016 1 次提交
  31. 23 3月, 2016 2 次提交
    • H
      parisc,metag: Implement CONFIG_DEBUG_STACK_USAGE option · 6c31da34
      Helge Deller 提交于
      On parisc and metag the stack grows upwards, so for those we need to
      scan the stack downwards in order to calculate how much stack a process
      has used.
      
      Tested on a 64bit parisc kernel.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      6c31da34
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750