1. 15 10月, 2021 1 次提交
  2. 28 7月, 2021 1 次提交
  3. 26 10月, 2020 1 次提交
  4. 17 10月, 2020 1 次提交
  5. 15 9月, 2020 1 次提交
  6. 29 8月, 2020 1 次提交
  7. 13 8月, 2020 5 次提交
  8. 27 7月, 2020 1 次提交
  9. 09 6月, 2020 2 次提交
    • G
      panic: add sysctl to dump all CPUs backtraces on oops event · 60c958d8
      Guilherme G. Piccoli 提交于
      Usually when the kernel reaches an oops condition, it's a point of no
      return; in case not enough debug information is available in the kernel
      splat, one of the last resorts would be to collect a kernel crash dump
      and analyze it.  The problem with this approach is that in order to
      collect the dump, a panic is required (to kexec-load the crash kernel).
      When in an environment of multiple virtual machines, users may prefer to
      try living with the oops, at least until being able to properly shutdown
      their VMs / finish their important tasks.
      
      This patch implements a way to collect a bit more debug details when an
      oops event is reached, by printing all the CPUs backtraces through the
      usage of NMIs (on architectures that support that).  The sysctl added
      (and documented) here was called "oops_all_cpu_backtrace", and when set
      will (as the name suggests) dump all CPUs backtraces.
      
      Far from ideal, this may be the last option though for users that for
      some reason cannot panic on oops.  Most of times oopses are clear enough
      to indicate the kernel portion that must be investigated, but in virtual
      environments it's possible to observe hypervisor/KVM issues that could
      lead to oopses shown in other guests CPUs (like virtual APIC crashes).
      This patch hence aims to help debug such complex issues without
      resorting to kdump.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60c958d8
    • R
      kernel: add panic_on_taint · db38d5c1
      Rafael Aquini 提交于
      Analogously to the introduction of panic_on_warn, this patch introduces
      a kernel option named panic_on_taint in order to provide a simple and
      generic way to stop execution and catch a coredump when the kernel gets
      tainted by any given flag.
      
      This is useful for debugging sessions as it avoids having to rebuild the
      kernel to explicitly add calls to panic() into the code sites that
      introduce the taint flags of interest.
      
      For instance, if one is interested in proceeding with a post-mortem
      analysis at the point a given code path is hitting a bad page (i.e.
      unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
      by rebooting the kernel with 'panic_on_taint=0x20' amended to the
      command line.
      
      Another, perhaps less frequent, use for this option would be as a means
      for assuring a security policy case where only a subset of taints, or no
      single taint (in paranoid mode), is allowed for the running system.  The
      optional switch 'nousertaint' is handy in this particular scenario, as
      it will avoid userspace induced crashes by writes to sysctl interface
      /proc/sys/kernel/tainted causing false positive hits for such policies.
      
      [akpm@linux-foundation.org: tweak kernel-parameters.txt wording]
      Suggested-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db38d5c1
  10. 21 2月, 2020 1 次提交
    • T
      sched: Provide cant_migrate() · 4e139c77
      Thomas Gleixner 提交于
      Some code pathes rely on preempt_disable() to prevent migration on a non RT
      enabled kernel. These preempt_disable/enable() pairs are substituted by
      migrate_disable/enable() pairs or other forms of RT specific protection. On
      RT these protections prevent migration but not preemption. Obviously a
      cant_sleep() check in such a section will trigger on RT because preemption
      is not disabled.
      
      Provide a cant_migrate() macro which maps to cant_sleep() on a non RT
      kernel and an empty placeholder for RT for now. The placeholder will be
      changed to a proper debug check along with the RT specific migration
      protection mechanism.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
      4e139c77
  11. 31 12月, 2019 1 次提交
  12. 05 12月, 2019 1 次提交
  13. 25 11月, 2019 1 次提交
  14. 07 9月, 2019 1 次提交
    • D
      kernel.h: Add non_block_start/end() · 312364f3
      Daniel Vetter 提交于
      In some special cases we must not block, but there's not a spinlock,
      preempt-off, irqs-off or similar critical section already that arms the
      might_sleep() debug checks. Add a non_block_start/end() pair to annotate
      these.
      
      This will be used in the oom paths of mmu-notifiers, where blocking is not
      allowed to make sure there's forward progress. Quoting Michal:
      
      "The notifier is called from quite a restricted context - oom_reaper -
      which shouldn't depend on any locks or sleepable conditionals. The code
      should be swift as well but we mostly do care about it to make a forward
      progress. Checking for sleepable context is the best thing we could come
      up with that would describe these demands at least partially."
      
      Peter also asked whether we want to catch spinlocks on top, but Michal
      said those are less of a problem because spinlocks can't have an indirect
      dependency upon the page allocator and hence close the loop with the oom
      reaper.
      
      Suggested by Michal Hocko.
      
      Link: https://lore.kernel.org/r/20190826201425.17547-4-daniel.vetter@ffwll.ch
      Acked-by: Christian König <christian.koenig@amd.com> (v1)
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      312364f3
  15. 17 7月, 2019 1 次提交
  16. 29 6月, 2019 1 次提交
  17. 15 5月, 2019 1 次提交
  18. 07 4月, 2019 1 次提交
    • C
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig 提交于
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      72deb455
  19. 03 4月, 2019 1 次提交
    • J
      linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() · a0fe2c64
      Jann Horn 提交于
      Use parentheses around uses of the argument in u64_to_user_ptr() to
      ensure that the cast doesn't apply to part of the argument.
      
      There are existing uses of the macro of the form
      
        u64_to_user_ptr(A + B)
      
      which expands to
      
        (void __user *)(uintptr_t)A + B
      
      (the cast applies to the first operand of the addition, the addition
      is a pointer addition). This happens to still work as intended, the
      semantic difference doesn't cause a difference in behavior.
      
      But I want to use u64_to_user_ptr() with a ternary operator in the
      argument, like so:
      
        u64_to_user_ptr(A ? B : C)
      
      This currently doesn't work as intended.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com
      a0fe2c64
  20. 08 3月, 2019 4 次提交
  21. 20 2月, 2019 1 次提交
  22. 05 1月, 2019 1 次提交
  23. 23 8月, 2018 1 次提交
  24. 21 6月, 2018 1 次提交
  25. 08 6月, 2018 1 次提交
  26. 27 5月, 2018 1 次提交
    • T
      PM / suspend: Prevent might sleep splats · c1a957d1
      Thomas Gleixner 提交于
      timekeeping suspend/resume calls read_persistent_clock() which takes
      rtc_lock. That results in might sleep warnings because at that point
      we run with interrupts disabled.
      
      We cannot convert rtc_lock to a raw spinlock as that would trigger
      other might sleep warnings.
      
      As a workaround we disable the might sleep warnings by setting
      system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
      restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
      contention because hibernate / suspend to RAM is single-CPU at this
      point.
      
      In s2idle's case the system_state is set to SYSTEM_SUSPEND before
      timekeeping_suspend() which is invoked by the last CPU. In the resume
      case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
      invoked by the first CPU in the resume case. The other CPUs will block
      on tick_freeze_lock.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      [bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c1a957d1
  27. 26 4月, 2018 1 次提交
  28. 23 4月, 2018 1 次提交
    • N
      staging: lustre: add container_of_safe() · 05e6557b
      NeilBrown 提交于
      Luster has a container_of0() function which is similar to
      container_of() but passes an IS_ERR_OR_NULL() pointer through
      unchanged.
      This could be generally useful: bcache at last has a similar function.
      
      Naming is hard, but the precedent set by hlist_entry_safe() suggests
      a _safe suffix might be most consistent.
      
      So add container_of_safe() to kernel.h, and replace all occurrences of
      container_of0() with one of
        - list_first_entry, list_next_entry, when that is a better fit,
        - container_of(), when the pointer is used as a validpointer in
          surrounding code,
        - container_of_safe() when there is no obviously better alternative.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJames Simmons <jsimmons@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e6557b
  29. 12 4月, 2018 3 次提交
  30. 10 4月, 2018 1 次提交
    • L
      Fix subtle macro variable shadowing in min_not_zero() · e9092d0d
      Linus Torvalds 提交于
      Commit 3c8ba0d6 ("kernel.h: Retain constant expression output for
      max()/min()") rewrote our min/max macros to be very clever, but in the
      meantime resurrected a variable name shadow issue that we had had
      previously fixed in commit 589a9785 ("min/max: remove sparse
      warnings when they're nested").
      
      That commit talks about the sparse warnings that this shadowing causes,
      which we ignored as just a minor annoyance.  But it turns out that the
      sparse warning is the least of our problems.  We actually have a real
      bug due to the shadowing through the interaction with "min_not_zero()",
      which ends up doing
      
         min(__x, __y)
      
      internally, and then the new declaration of "__x" and "__y" as new
      variables in __cmp_once() results in a complete mess of an expression,
      and "min_not_zero()" doesn't work at all.
      
      For some odd reason, this only ever caused (reported) problems on s390,
      even though it is a generic issue and most of the (obviously successful)
      testing of the problematic commit had happened on other architectures.
      
      Quoting Sebastian Ott:
       "What happened is that the bio build by the partition detection code
        was attempted to be split by the block layer because the block queue
        had a max_sector setting of 0. blk_queue_max_hw_sectors uses
        min_not_zero."
      
      So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
      macros do not have these kinds of clashes.
      
      [ That said, __UNIQUE_ID() itself has several issues that make it less
        than wonderful.
      
        In particular, the "uniqueness" has a fallback on the line number,
        which means that it's not actually unique in more complex cases if you
        don't build with gcc or clang (which have working unique counters that
        aren't tied to line numbers).
      
        That historical broken fallback also means that we have that pointless
        "prefix" argument that doesn't actually make much sense _except_ for
        the known-broken case. Oh well. ]
      
      Fixes: 3c8ba0d6 ("kernel.h: Retain constant expression output for max()/min()")
      Reported-and-tested-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9092d0d