1. 21 2月, 2020 1 次提交
    • T
      sched: Provide cant_migrate() · 4e139c77
      Thomas Gleixner 提交于
      Some code pathes rely on preempt_disable() to prevent migration on a non RT
      enabled kernel. These preempt_disable/enable() pairs are substituted by
      migrate_disable/enable() pairs or other forms of RT specific protection. On
      RT these protections prevent migration but not preemption. Obviously a
      cant_sleep() check in such a section will trigger on RT because preemption
      is not disabled.
      
      Provide a cant_migrate() macro which maps to cant_sleep() on a non RT
      kernel and an empty placeholder for RT for now. The placeholder will be
      changed to a proper debug check along with the RT specific migration
      protection mechanism.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
      4e139c77
  2. 31 12月, 2019 1 次提交
  3. 05 12月, 2019 1 次提交
  4. 25 11月, 2019 1 次提交
  5. 07 9月, 2019 1 次提交
    • D
      kernel.h: Add non_block_start/end() · 312364f3
      Daniel Vetter 提交于
      In some special cases we must not block, but there's not a spinlock,
      preempt-off, irqs-off or similar critical section already that arms the
      might_sleep() debug checks. Add a non_block_start/end() pair to annotate
      these.
      
      This will be used in the oom paths of mmu-notifiers, where blocking is not
      allowed to make sure there's forward progress. Quoting Michal:
      
      "The notifier is called from quite a restricted context - oom_reaper -
      which shouldn't depend on any locks or sleepable conditionals. The code
      should be swift as well but we mostly do care about it to make a forward
      progress. Checking for sleepable context is the best thing we could come
      up with that would describe these demands at least partially."
      
      Peter also asked whether we want to catch spinlocks on top, but Michal
      said those are less of a problem because spinlocks can't have an indirect
      dependency upon the page allocator and hence close the loop with the oom
      reaper.
      
      Suggested by Michal Hocko.
      
      Link: https://lore.kernel.org/r/20190826201425.17547-4-daniel.vetter@ffwll.ch
      Acked-by: Christian König <christian.koenig@amd.com> (v1)
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      312364f3
  6. 17 7月, 2019 1 次提交
  7. 29 6月, 2019 1 次提交
  8. 15 5月, 2019 1 次提交
  9. 07 4月, 2019 1 次提交
    • C
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig 提交于
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      72deb455
  10. 03 4月, 2019 1 次提交
    • J
      linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() · a0fe2c64
      Jann Horn 提交于
      Use parentheses around uses of the argument in u64_to_user_ptr() to
      ensure that the cast doesn't apply to part of the argument.
      
      There are existing uses of the macro of the form
      
        u64_to_user_ptr(A + B)
      
      which expands to
      
        (void __user *)(uintptr_t)A + B
      
      (the cast applies to the first operand of the addition, the addition
      is a pointer addition). This happens to still work as intended, the
      semantic difference doesn't cause a difference in behavior.
      
      But I want to use u64_to_user_ptr() with a ternary operator in the
      argument, like so:
      
        u64_to_user_ptr(A ? B : C)
      
      This currently doesn't work as intended.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com
      a0fe2c64
  11. 08 3月, 2019 4 次提交
  12. 20 2月, 2019 1 次提交
  13. 05 1月, 2019 1 次提交
  14. 23 8月, 2018 1 次提交
  15. 21 6月, 2018 1 次提交
  16. 08 6月, 2018 1 次提交
  17. 27 5月, 2018 1 次提交
    • T
      PM / suspend: Prevent might sleep splats · c1a957d1
      Thomas Gleixner 提交于
      timekeeping suspend/resume calls read_persistent_clock() which takes
      rtc_lock. That results in might sleep warnings because at that point
      we run with interrupts disabled.
      
      We cannot convert rtc_lock to a raw spinlock as that would trigger
      other might sleep warnings.
      
      As a workaround we disable the might sleep warnings by setting
      system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
      restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
      contention because hibernate / suspend to RAM is single-CPU at this
      point.
      
      In s2idle's case the system_state is set to SYSTEM_SUSPEND before
      timekeeping_suspend() which is invoked by the last CPU. In the resume
      case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
      invoked by the first CPU in the resume case. The other CPUs will block
      on tick_freeze_lock.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      [bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c1a957d1
  18. 26 4月, 2018 1 次提交
  19. 23 4月, 2018 1 次提交
    • N
      staging: lustre: add container_of_safe() · 05e6557b
      NeilBrown 提交于
      Luster has a container_of0() function which is similar to
      container_of() but passes an IS_ERR_OR_NULL() pointer through
      unchanged.
      This could be generally useful: bcache at last has a similar function.
      
      Naming is hard, but the precedent set by hlist_entry_safe() suggests
      a _safe suffix might be most consistent.
      
      So add container_of_safe() to kernel.h, and replace all occurrences of
      container_of0() with one of
        - list_first_entry, list_next_entry, when that is a better fit,
        - container_of(), when the pointer is used as a validpointer in
          surrounding code,
        - container_of_safe() when there is no obviously better alternative.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJames Simmons <jsimmons@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e6557b
  20. 12 4月, 2018 3 次提交
  21. 10 4月, 2018 1 次提交
    • L
      Fix subtle macro variable shadowing in min_not_zero() · e9092d0d
      Linus Torvalds 提交于
      Commit 3c8ba0d6 ("kernel.h: Retain constant expression output for
      max()/min()") rewrote our min/max macros to be very clever, but in the
      meantime resurrected a variable name shadow issue that we had had
      previously fixed in commit 589a9785 ("min/max: remove sparse
      warnings when they're nested").
      
      That commit talks about the sparse warnings that this shadowing causes,
      which we ignored as just a minor annoyance.  But it turns out that the
      sparse warning is the least of our problems.  We actually have a real
      bug due to the shadowing through the interaction with "min_not_zero()",
      which ends up doing
      
         min(__x, __y)
      
      internally, and then the new declaration of "__x" and "__y" as new
      variables in __cmp_once() results in a complete mess of an expression,
      and "min_not_zero()" doesn't work at all.
      
      For some odd reason, this only ever caused (reported) problems on s390,
      even though it is a generic issue and most of the (obviously successful)
      testing of the problematic commit had happened on other architectures.
      
      Quoting Sebastian Ott:
       "What happened is that the bio build by the partition detection code
        was attempted to be split by the block layer because the block queue
        had a max_sector setting of 0. blk_queue_max_hw_sectors uses
        min_not_zero."
      
      So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
      macros do not have these kinds of clashes.
      
      [ That said, __UNIQUE_ID() itself has several issues that make it less
        than wonderful.
      
        In particular, the "uniqueness" has a fallback on the line number,
        which means that it's not actually unique in more complex cases if you
        don't build with gcc or clang (which have working unique counters that
        aren't tied to line numbers).
      
        That historical broken fallback also means that we have that pointless
        "prefix" argument that doesn't actually make much sense _except_ for
        the known-broken case. Oh well. ]
      
      Fixes: 3c8ba0d6 ("kernel.h: Retain constant expression output for max()/min()")
      Reported-and-tested-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9092d0d
  22. 06 4月, 2018 1 次提交
    • K
      kernel.h: Retain constant expression output for max()/min() · 3c8ba0d6
      Kees Cook 提交于
      In the effort to remove all VLAs from the kernel[1], it is desirable to
      build with -Wvla.  However, this warning is overly pessimistic, in that
      it is only happy with stack array sizes that are declared as constant
      expressions, and not constant values.  One case of this is the
      evaluation of the max() macro which, due to its construction, ends up
      converting constant expression arguments into a constant value result.
      
      All attempts to rewrite this macro with __builtin_constant_p() failed
      with older compilers (e.g.  gcc 4.4)[2].  However, Martin Uecker,
      constructed[3] a mind-shattering solution that works everywhere.
      Cthulhu fhtagn!
      
      This patch updates the min()/max() macros to evaluate to a constant
      expression when called on constant expression arguments.  This removes
      several false-positive stack VLA warnings from an x86 allmodconfig build
      when -Wvla is added:
      
        $ diff -u before.txt after.txt | grep ^-
        -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla]
        -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla]
        -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla]
        -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla]
      
      This also updates two cases where different enums were being compared
      and explicitly casts them to int (which matches the old side-effect of
      the single-evaluation code): one in tpm/tpm_tis_core.h, and one in
      drm/drm_color_mgmt.c.
      
       [1] https://lkml.org/lkml/2018/3/7/621
       [2] https://lkml.org/lkml/2018/3/10/170
       [3] https://lkml.org/lkml/2018/3/20/845Co-Developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Co-Developed-by: NMartin Uecker <Martin.Uecker@med.uni-goettingen.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c8ba0d6
  23. 29 3月, 2018 1 次提交
  24. 21 2月, 2018 1 次提交
  25. 04 2月, 2018 1 次提交
  26. 18 11月, 2017 1 次提交
    • B
      kernel/panic.c: add TAINT_AUX · 4efb442c
      Borislav Petkov 提交于
      This is the gist of a patch which we've been forward-porting in our
      kernels for a long time now and it probably would make a good sense to
      have such TAINT_AUX flag upstream which can be used by each distro etc,
      how they see fit.  This way, we won't need to forward-port a distro-only
      version indefinitely.
      
      Add an auxiliary taint flag to be used by distros and others.  This
      obviates the need to forward-port whatever internal solutions people
      have in favor of a single flag which they can map arbitrarily to a
      definition of their pleasing.
      
      The "X" mnemonic could also mean eXternal, which would be taint from a
      distro or something else but not the upstream kernel.  We will use it to
      mark modules for which we don't provide support.  I.e., a really
      eXternal module.
      
      Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4efb442c
  27. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  28. 14 10月, 2017 1 次提交
  29. 09 9月, 2017 1 次提交
  30. 17 8月, 2017 1 次提交
    • K
      locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e
      Kees Cook 提交于
      This implements refcount_t overflow protection on x86 without a noticeable
      performance impact, though without the fuller checking of REFCOUNT_FULL.
      
      This is done by duplicating the existing atomic_t refcount implementation
      but with normally a single instruction added to detect if the refcount
      has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
      the handler saturates the refcount_t to INT_MIN / 2. With this overflow
      protection, the erroneous reference release that would follow a wrap back
      to zero is blocked from happening, avoiding the class of refcount-overflow
      use-after-free vulnerabilities entirely.
      
      Only the overflow case of refcounting can be perfectly protected, since
      it can be detected and stopped before the reference is freed and left to
      be abused by an attacker. There isn't a way to block early decrements,
      and while REFCOUNT_FULL stops increment-from-zero cases (which would
      be the state _after_ an early decrement and stops potential double-free
      conditions), this fast implementation does not, since it would require
      the more expensive cmpxchg loops. Since the overflow case is much more
      common (e.g. missing a "put" during an error path), this protection
      provides real-world protection. For example, the two public refcount
      overflow use-after-free exploits published in 2016 would have been
      rendered unexploitable:
      
        http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
      
        http://cyseclabs.com/page?n=02012016
      
      This implementation does, however, notice an unchecked decrement to zero
      (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
      resulted in a zero). Decrements under zero are noticed (since they will
      have resulted in a negative value), though this only indicates that a
      use-after-free may have already happened. Such notifications are likely
      avoidable by an attacker that has already exploited a use-after-free
      vulnerability, but it's better to have them reported than allow such
      conditions to remain universally silent.
      
      On first overflow detection, the refcount value is reset to INT_MIN / 2
      (which serves as a saturation value) and a report and stack trace are
      produced. When operations detect only negative value results (such as
      changing an already saturated value), saturation still happens but no
      notification is performed (since the value was already saturated).
      
      On the matter of races, since the entire range beyond INT_MAX but before
      0 is negative, every operation at INT_MIN / 2 will trap, leaving no
      overflow-only race condition.
      
      As for performance, this implementation adds a single "js" instruction
      to the regular execution flow of a copy of the standard atomic_t refcount
      operations. (The non-"and_test" refcount_dec() function, which is uncommon
      in regular refcount design patterns, has an additional "jz" instruction
      to detect reaching exactly zero.) Since this is a forward jump, it is by
      default the non-predicted path, which will be reinforced by dynamic branch
      prediction. The result is this protection having virtually no measurable
      change in performance over standard atomic_t operations. The error path,
      located in .text.unlikely, saves the refcount location and then uses UD0
      to fire a refcount exception handler, which resets the refcount, handles
      reporting, and returns to regular execution. This keeps the changes to
      .text size minimal, avoiding return jumps and open-coded calls to the
      error reporting routine.
      
      Example assembly comparison:
      
      refcount_inc() before:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
      
      refcount_inc() after:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
        ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
        ...
        .text.unlikely:
        ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
        ffffffff816c36d7:       0f ff                   (bad)
      
      These are the cycle counts comparing a loop of refcount_inc() from 1
      to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
      unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
      (refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
      
        2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
      		    cycles		protections
        atomic_t           82249267387	none
        refcount_t-fast    82211446892	overflow, untested dec-to-zero
        refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero
      
      This code is a modified version of the x86 PAX_REFCOUNT atomic_t
      overflow defense from the last public patch of PaX/grsecurity, based
      on my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code. Thanks
      to PaX Team for various suggestions for improvement for repurposing this
      code to be a refcount-only protection.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170815161924.GA133115@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a46ec0e
  31. 13 7月, 2017 1 次提交
    • I
      kernel.h: handle pointers to arrays better in container_of() · c7acec71
      Ian Abbott 提交于
      If the first parameter of container_of() is a pointer to a
      non-const-qualified array type (and the third parameter names a
      non-const-qualified array member), the local variable __mptr will be
      defined with a const-qualified array type.  In ISO C, these types are
      incompatible.  They work as expected in GNU C, but some versions will
      issue warnings.  For example, GCC 4.9 produces the warning
      "initialization from incompatible pointer type".
      
      Here is an example of where the problem occurs:
      
      -------------------------------------------------------
         #include <linux/kernel.h>
         #include <linux/module.h>
      
        MODULE_LICENSE("GPL");
      
        struct st {
        	int a;
        	char b[16];
        };
      
        static int __init example_init(void) {
        	struct st t = { .a = 101, .b = "hello" };
        	char (*p)[16] = &t.b;
        	struct st *x = container_of(p, struct st, b);
        	printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x);
        	return 0;
        }
      
        static void __exit example_exit(void) {
        }
      
        module_init(example_init);
        module_exit(example_exit);
      -------------------------------------------------------
      
      Building the module with gcc-4.9 results in these warnings (where '{m}'
      is the module source and '{k}' is the kernel source):
      
      -------------------------------------------------------
        In file included from {m}/example.c:1:0:
        {m}/example.c: In function `example_init':
        {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type
          const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                        ^
        {m}/example.c:14:17: note: in expansion of macro `container_of'
          struct st *x = container_of(p, struct st, b);
                         ^
        {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x')
          const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                        ^
        {m}/example.c:14:17: note: in expansion of macro `container_of'
          struct st *x = container_of(p, struct st, b);
                         ^
      -------------------------------------------------------
      
      Replace the type checking performed by the macro to avoid these
      warnings.  Make sure `*(ptr)` either has type compatible with the
      member, or has type compatible with `void`, ignoring qualifiers.  Raise
      compiler errors if this is not true.  This is stronger than the previous
      behaviour, which only resulted in compiler warnings for a type mismatch.
      
      [arnd@arndb.de: fix new warnings for container_of()]
        Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.ukSigned-off-by: NIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7acec71
  32. 23 5月, 2017 1 次提交
  33. 21 4月, 2017 1 次提交
  34. 18 4月, 2017 1 次提交
    • B
      boot/param: Move next_arg() function to lib/cmdline.c for later reuse · f51b17c8
      Baoquan He 提交于
      next_arg() will be used to parse boot parameters in the x86/boot/compressed code,
      so move it to lib/cmdline.c for better code reuse.
      
      No change in functionality.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Larry Finger <Larry.Finger@lwfinger.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Cc: dave.jiang@intel.com
      Cc: dyoung@redhat.com
      Cc: keescook@chromium.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f51b17c8
  35. 25 2月, 2017 1 次提交