提交 · f53b6dda4d9b6e4ba1af5efd6c6650784996b4e7 · openeuler / Kernel

24 1月, 2023 3 次提交

由 Kees Cook 提交于 11月 17, 2022

commit 9fc9e278 upstream.

Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-5-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f53b6dda

exit: Allow oops_limit to be disabled · e0738725

由 Kees Cook 提交于 12月 02, 2022

commit de92f657 upstream.

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e0738725

exit: Put an upper limit on how often we can oops · 767997ef

由 Jann Horn 提交于 11月 17, 2022

commit d4ccd54d upstream.

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.
Signed-off-by: NJann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.comReviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

767997ef

31 12月, 2022 1 次提交

x86/split_lock: Add sysctl to control the misery mode · bb1878d7

由 Guilherme G. Piccoli 提交于 10月 24, 2022

[ Upstream commit 72720937 ]

Commit b041b525 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.

[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/

[ dhansen: minor changelog tweaks, including clarifying the actual
  	   problem ]

Fixes: b041b525 ("x86/split_lock: Make life miserable for split lockers")
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Tested-by: NAndre Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.comSigned-off-by: NSasha Levin <sashal@kernel.org>

bb1878d7

12 9月, 2022 2 次提交

kernel/utsname_sysctl.c: print kernel arch · bfca3dd3

由 Petr Vorel 提交于 9月 01, 2022

Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.

This helps people who debug kernel with initramfs with minimal environment
(i.e.  without coreutils or even busybox) or allow to open sysfs file
instead of run 'uname -m' in high level languages.

Link: https://lkml.kernel.org/r/20220901194403.3819-1-pvorel@suse.czSigned-off-by: NPetr Vorel <pvorel@suse.cz>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Sterba <dsterba@suse.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

bfca3dd3

memory tiering: rate limit NUMA migration throughput · c6833e10

由 Huang Ying 提交于 7月 13, 2022

In NUMA balancing memory tiering mode, if there are hot pages in slow
memory node and cold pages in fast memory node, we need to promote/demote
hot/cold pages between the fast and cold memory nodes.

A choice is to promote/demote as fast as possible.  But the CPU cycles and
memory bandwidth consumed by the high promoting/demoting throughput will
hurt the latency of some workload because of accessing inflating and slow
memory bandwidth contention.

A way to resolve this issue is to restrict the max promoting/demoting
throughput.  It will take longer to finish the promoting/demoting.  But
the workload latency will be better.  This is implemented in this patch as
the page promotion rate limit mechanism.

The number of the candidate pages to be promoted to the fast memory node
via NUMA balancing is counted, if the count exceeds the limit specified by
the users, the NUMA balancing promotion will be stopped until the next
second.

A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added
for the users to specify the limit.

Link: https://lkml.kernel.org/r/20220713083954.34196-3-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: osalvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zhong Jiang <zhongjiang-ali@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

c6833e10

27 7月, 2022 1 次提交

powerpc/pseries/mobility: set NMI watchdog factor during an LPM · 118b1366

由 Laurent Dufour 提交于 7月 13, 2022

During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred
pages on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad
place in the kernel, leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a
too strong work around, some users would want this timeout to be
eventually triggered if the system is hanging even during an LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during an LPM. Just before the CPUs
are stopped for the switchover sequence, the NMI watchdog timer is set
to watchdog_thresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the
NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh
value). Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during an LPM.
Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com

118b1366

25 6月, 2022 1 次提交

docs: admin-guide/sysctl: Fix rendering error · 30fb8761

由 Stephen Kitt 提交于 6月 24, 2022

Text in ``literal`` markup must be separated by word separators, so text
like ``lowwater``% renders incorrectly.  Add the suggested "\ " after two
problematic occurrences.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220624110230.595740-1-steve@sk2.org
[jc: tweaked to use "\ "]
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

30fb8761

14 5月, 2022 1 次提交

random: fix sysctl documentation nits · 069c4ea6

由 Jason A. Donenfeld 提交于 5月 03, 2022

A semicolon was missing, and the almost-alphabetical-but-not ordering
was confusing, so regroup these by category instead.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

069c4ea6

02 5月, 2022 1 次提交

Documentation/sysctl: document max_rcu_stall_to_panic · 81c65365

由 Joel Savitz 提交于 3月 24, 2022

commit dfe56404 ("rcu: Panic after fixed number of stalls")
introduced a new systctl but no accompanying documentation.

Add a simple entry to the documentation.
Signed-off-by: NJoel Savitz <jsavitz@redhat.com>
Acked-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

81c65365

24 3月, 2022 2 次提交

panic: add option to dump all CPUs backtraces in panic_print · 8d470a45

由 Guilherme G. Piccoli 提交于 3月 23, 2022

Currently the "panic_print" parameter/sysctl allows some interesting debug
information to be printed during a panic event. This is useful for
example in cases the user cannot kdump due to resource limits, or if the
user collects panic logs in a serial output (or pstore) and prefers a fast
reboot instead of a kdump.

Happens that currently there's no way to see all CPUs backtraces in a
panic using "panic_print" on architectures that support that. We do have
"oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
functionality, they are orthogonal in nature: "panic_print" is a panic
tuning (and we have panics without oopses, like direct calls to panic() or
maybe other paths that don't go through oops_enter() function), and the
original purpose of "oops_all_cpu_backtrace" is to provide more
information on oopses for cases in which the users desire to continue
running the kernel even after an oops, i.e., used in non-panic scenarios.

So, we hereby introduce an additional bit for "panic_print" to allow
dumping the CPUs backtraces during a panic event.

Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.comSigned-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: NFeng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d470a45

docs: sysctl/kernel: add missing bit to panic_print · a1ff1de0

由 Guilherme G. Piccoli 提交于 3月 23, 2022

Patch series "Some improvements on panic_print".

This is a mix of a documentation fix with some additions to the
"panic_print" syscall / parameter.  The goal here is being able to collect
all CPUs backtraces during a panic event and also to enable "panic_print"
in a kdump event - details of the reasoning and design choices in the
patches.

This patch (of 3):

Commit de6da1e8 ("panic: add an option to replay all the printk
message in buffer") added a new bit to the sysctl/kernel parameter
"panic_print", but the documentation was added only in
kernel-parameters.txt, not in the sysctl guide.

Fix it here by adding bit 5 to sysctl admin-guide documentation.

[rdunlap@infradead.org: fix table format warning]
  Link: https://lkml.kernel.org/r/20220109055635.6999-1-rdunlap@infradead.org

Link: https://lkml.kernel.org/r/20211109202848.610874-1-gpiccoli@igalia.com
Link: https://lkml.kernel.org/r/20211109202848.610874-2-gpiccoli@igalia.com
Fixes: de6da1e8 ("panic: add an option to replay all the printk message in buffer")
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: NFeng Tang <feng.tang@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1ff1de0

23 3月, 2022 1 次提交

NUMA balancing: optimize page placement for memory tiering system · c574bbe9

由 Huang Ying 提交于 3月 22, 2022

With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are usually
different.

In such system, because of the memory accessing pattern changing etc,
some pages in the slow memory may become hot globally.  So in this
patch, the NUMA balancing mechanism is enhanced to optimize the page
placement among the different memory types according to hot/cold
dynamically.

In a typical memory tiering system, there are CPUs, fast memory and slow
memory in each physical NUMA node.  The CPUs and the fast memory will be
put in one logical node (called fast memory node), while the slow memory
will be put in another (faked) logical node (called slow memory node).
That is, the fast memory is regarded as local while the slow memory is
regarded as remote.  So it's possible for the recently accessed pages in
the slow memory node to be promoted to the fast memory node via the
existing NUMA balancing mechanism.

The original NUMA balancing mechanism will stop to migrate pages if the
free memory of the target node becomes below the high watermark.  This
is a reasonable policy if there's only one memory type.  But this makes
the original NUMA balancing mechanism almost do not work to optimize
page placement among different memory types.  Details are as follows.

It's the common cases that the working-set size of the workload is
larger than the size of the fast memory nodes.  Otherwise, it's
unnecessary to use the slow memory at all.  So, there are almost always
no enough free pages in the fast memory nodes, so that the globally hot
pages in the slow memory node cannot be promoted to the fast memory
node.  To solve the issue, we have 2 choices as follows,

a. Ignore the free pages watermark checking when promoting hot pages
   from the slow memory node to the fast memory node.  This will
   create some memory pressure in the fast memory node, thus trigger
   the memory reclaiming.  So that, the cold pages in the fast memory
   node will be demoted to the slow memory node.

b. Define a new watermark called wmark_promo which is higher than
   wmark_high, and have kswapd reclaiming pages until free pages reach
   such watermark.  The scenario is as follows: when we want to promote
   hot-pages from a slow memory to a fast memory, but fast memory's free
   pages would go lower than high watermark with such promotion, we wake
   up kswapd with wmark_promo watermark in order to demote cold pages and
   free us up some space.  So, next time we want to promote hot-pages we
   might have a chance of doing so.

The choice "a" may create high memory pressure in the fast memory node.
If the memory pressure of the workload is high, the memory pressure
may become so high that the memory allocation latency of the workload
is influenced, e.g.  the direct reclaiming may be triggered.

The choice "b" works much better at this aspect.  If the memory
pressure of the workload is high, the hot pages promotion will stop
earlier because its allocation watermark is higher than that of the
normal memory allocation.  So in this patch, choice "b" is implemented.
A new zone watermark (WMARK_PROMO) is added.  Which is larger than the
high watermark and can be controlled via watermark_scale_factor.

In addition to the original page placement optimization among sockets,
the NUMA balancing mechanism is extended to be used to optimize page
placement according to hot/cold among different memory types.  So the
sysctl user space interface (numa_balancing) is extended in a backward
compatible way as follow, so that the users can enable/disable these
functionality individually.

The sysctl is converted from a Boolean value to a bits field.  The
definition of the flags is,

- 0: NUMA_BALANCING_DISABLED
- 1: NUMA_BALANCING_NORMAL
- 2: NUMA_BALANCING_MEMORY_TIERING

We have tested the patch with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent
Memory Model.  The test results shows that the pmbench score can
improve up to 95.9%.

Thanks Andrew Morton to help fix the document format error.

Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Tested-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NYang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c574bbe9

22 2月, 2022 1 次提交

random: remove ifdef'd out interrupt bench · 95e6060c

由 Jason A. Donenfeld 提交于 2月 10, 2022

With tools like kbench9000 giving more finegrained responses, and this
basically never having been used ever since it was initially added,
let's just get rid of this. There *is* still work to be done on the
interrupt handler, but this really isn't the way it's being developed.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

95e6060c

21 2月, 2022 1 次提交

random: always wake up entropy writers after extraction · 489c7fc4

由 Jason A. Donenfeld 提交于 2月 05, 2022

Now that POOL_BITS == POOL_MIN_BITS, we must unconditionally wake up
entropy writers after every extraction. Therefore there's no point of
write_wakeup_threshold, so we can move it to the dustbin of unused
compatibility sysctls. While we're at it, we can fix a small comparison
where we were waking up after <= min rather than < min.

Cc: Theodore Ts'o <tytso@mit.edu>
Suggested-by: NEric Biggers <ebiggers@kernel.org>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

489c7fc4

12 2月, 2022 1 次提交

sched/numa-balancing: Move some document to make it consistent with the code · 3624ba7b

由 Huang Ying 提交于 2月 10, 2022

After commit 8a99b683 ("sched: Move SCHED_DEBUG sysctl to
debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has
been moved to debugfs.  This patch move the document for these
sysctls from

  Documentation/admin-guide/sysctl/kernel.rst

to

  Documentation/scheduler/sched-debug.rst

to make the document consistent with the code.
Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20220210052514.3038279-1-ying.huang@intel.com

3624ba7b

14 12月, 2021 1 次提交

arm64: perf: Add userspace counter access disable switch · e2012600

由 Rob Herring 提交于 12月 08, 2021

Like x86, some users may want to disable userspace PMU counter
altogether. Add a sysctl 'perf_user_access' file to control userspace
counter access. The default is '0' which is disabled. Writing '1'
enables access.

Note that x86 supports globally enabling user access by writing '2' to
/sys/bus/event_source/devices/cpu/rdpmc. As there's not existing
userspace support to worry about, this shouldn't be necessary for Arm.
It could be added later if the need arises.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-perf-users@vger.kernel.org
Acked-by: NWill Deacon <will@kernel.org>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NRob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-4-robh@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

e2012600

17 11月, 2021 1 次提交

docs: accounting: update delay-accounting.rst reference · 0f60a29c

由 Mauro Carvalho Chehab 提交于 11月 16, 2021

The file name: accounting/delay-accounting.rst
should be, instead: Documentation/accounting/delay-accounting.rst.

Also, there's no need to use doc:`foo`, as automarkup.py will
automatically handle plain text mentions to Documentation/
files.

So, update its cross-reference accordingly.

Fixes: fcb50170 ("delayacct: Document task_delayacct sysctl")
Fixes: c3123552 ("docs: accounting: convert to ReST")
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

0f60a29c

30 6月, 2021 1 次提交

doc: watchdog: modify the doc related to "watchdog/%u" · 256f7a67

由 Wang Qing 提交于 6月 28, 2021

"watchdog/%u" threads has be replaced by cpu_stop_work.  The current
description is extremely misleading.

Link: https://lkml.kernel.org/r/1619687073-24686-5-git-send-email-wangqing@vivo.comSigned-off-by: NWang Qing <wangqing@vivo.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

256f7a67

18 6月, 2021 1 次提交

docs: admin-guide: sysctl: avoid using ReST :doc:`foo` markup · 2793e19d

由 Mauro Carvalho Chehab 提交于 6月 16, 2021

The :doc:`foo` tag is auto-generated via automarkup.py.
So, use the filename at the sources, instead of :doc:`foo`.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/12abd2290c7ebc05c89178d2556bea740bd70fac.1623824363.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

2793e19d

18 5月, 2021 1 次提交

delayacct: Document task_delayacct sysctl · fcb50170

由 Mel Gorman 提交于 5月 12, 2021

Update sysctl/kernel.rst.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210512114035.GH3672@suse.de

fcb50170

15 5月, 2021 1 次提交

docs: admin-guide: update description for kernel.modprobe sysctl · f4d3f25a

由 Rasmus Villemoes 提交于 5月 14, 2021

When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/.
It's still true that this defaults to /sbin/modprobe, but now via a level
of indirection.  So document that the kernel might have been built with
something other than /sbin/modprobe as the initial value.

Link: https://lkml.kernel.org/r/20210420125324.1246826-1-linux@rasmusvillemoes.dk
Fixes: 17652f42 ("modules: add CONFIG_MODPROBE_PATH")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4d3f25a

14 5月, 2021 1 次提交

docs: admin-guide: update description for kernel.hotplug sysctl · 1e886090

由 Rasmus Villemoes 提交于 4月 20, 2021

It's been a few releases since this defaulted to /sbin/hotplug. Update
the text, and include pointers to the two CONFIG_UEVENT_HELPER{,_PATH}
config knobs whose help text could provide more info, but also hint
that the user probably doesn't need to care at all.

Fixes: 7934779a ("Driver-Core: disable /sbin/hotplug by default")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20210420120638.1104016-1-linux@rasmusvillemoes.dkSigned-off-by: NJonathan Corbet <corbet@lwn.net>

1e886090

12 5月, 2021 1 次提交

bpf: Add kconfig knob for disabling unpriv bpf by default · 08389d88

由 Daniel Borkmann 提交于 5月 11, 2021

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net

08389d88

09 12月, 2020 3 次提交

docs: Update documentation to reflect what TAINT_CPU_OUT_OF_SPEC means · 547f574f

由 Mathieu Chouquet-Stringer 提交于 12月 02, 2020

Here's a patch updating the meaning of TAINT_CPU_OUT_OF_SPEC after
Borislav introduced changes in a7e1f67e and upcoming patches in tip.

TAINT_CPU_OUT_OF_SPEC now means a bit more what it implies as the
flag isn't set just because of a CPU misconfiguration or mismatch.
Historically it was for SMP kernel oops on an officially SMP incapable
processor but now it also covers CPUs whose MSRs have been incorrectly
poked at from userspace, drivers being used on non supported
architectures, broken firmware, mismatched CPUs, ...

Update documentation and script to reflect that.
Signed-off-by: NMathieu Chouquet-Stringer <me@mathieu.digital>
Link: https://lore.kernel.org/r/20201202153244.709752-1-me@mathieu.digitalSigned-off-by: NJonathan Corbet <corbet@lwn.net>

547f574f

Documentation: fix multiple typos found in the admin-guide subdirectory · 751d5b27

由 Andrew Klychkov 提交于 12月 04, 2020

Fix thirty five typos in dm-integrity.rst, dm-raid.rst, dm-zoned.rst,
verity.rst, writecache.rst, tsx_async_abort.rst, md.rst, bttv.rst,
dvb_references.rst, frontend-cardlist.rst, gspca-cardlist.rst, ipu3.rst,
remote-controller.rst, mm/index.rst, numaperf.rst, userfaultfd.rst,
module-signing.rst, imx-ddr.rst, intel-speed-select.rst,
intel_pstate.rst, ramoops.rst, abi.rst, kernel.rst, vm.rst
Signed-off-by: NAndrew Klychkov <andrew.a.klychkov@gmail.com>
Link: https://lore.kernel.org/r/20201204072848.GA49895@spblnx124.lanSigned-off-by: NJonathan Corbet <corbet@lwn.net>

751d5b27

docs: clean up sysctl/kernel: titles, version · d151a23d

由 Stephen Kitt 提交于 12月 08, 2020

This cleans up a few titles with extra colons, and removes the
reference to kernel 2.2. The docs don't yet cover *all* of 5.10 or
5.11, but I think they're close enough. Most entries are documented,
and have been checked against current kernels.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20201208074922.30359-1-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

d151a23d

13 8月, 2020 1 次提交

coredump: add %f for executable filename · f38c85f1

由 Lepton Wu 提交于 8月 11, 2020

The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code. Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document. And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename. We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.
Signed-off-by: NLepton Wu <ytht.net@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f38c85f1

29 7月, 2020 1 次提交

Documentation/sysctl: Document uclamp sysctl knobs · 1f73d1ab

由 Qais Yousef 提交于 7月 16, 2020

Uclamp exposes 3 sysctl knobs:

	* sched_util_clamp_min
	* sched_util_clamp_max
	* sched_util_clamp_min_rt_default

Document them in sysctl/kernel.rst.
Signed-off-by: NQais Yousef <qais.yousef@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-3-qais.yousef@arm.com

1f73d1ab

06 7月, 2020 1 次提交

Documentation/admin-guide: sysctl/kernel: drop doubled word · ee74db08

由 Randy Dunlap 提交于 7月 03, 2020

Drop the doubled word "set".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20200704032020.21923-12-rdunlap@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

ee74db08

27 6月, 2020 1 次提交

docs: sysctl/kernel: document random · 0b227076

由 Stephen Kitt 提交于 6月 23, 2020

This documents the random directory, based on the behaviour seen in
drivers/char/random.c.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200623112514.10650-1-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

0b227076

20 6月, 2020 1 次提交

Documentation: fix sysctl/kernel.rst heading format warnings · e996919b

由 Randy Dunlap 提交于 6月 14, 2020

Fix heading format warnings in admin-guide/sysctl/kernel.rst:

Documentation/admin-guide/sysctl/kernel.rst:339: WARNING: Title underline too short.
hung_task_all_cpu_backtrace:
================

Documentation/admin-guide/sysctl/kernel.rst:650: WARNING: Title underline too short.
oops_all_cpu_backtrace:
================

Fixes: 0ec9dc9b ("kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected")
Fixes: 60c958d8 ("panic: add sysctl to dump all CPUs backtraces on oops event")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8af1cb77-4b5a-64b9-da5d-f6a95e537f99@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

e996919b

09 6月, 2020 3 次提交

panic: add sysctl to dump all CPUs backtraces on oops event · 60c958d8

由 Guilherme G. Piccoli 提交于 6月 07, 2020

Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it. The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.

This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that). The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.

Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops. Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60c958d8

kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected · 0ec9dc9b

由 Guilherme G. Piccoli 提交于 6月 07, 2020

Commit 401c636a ("kernel/hung_task.c: show all hung tasks before
panic") introduced a change in that we started to show all CPUs
backtraces when a hung task is detected _and_ the sysctl/kernel
parameter "hung_task_panic" is set. The idea is good, because usually
when observing deadlocks (that may lead to hung tasks), the culprit is
another task holding a lock and not necessarily the task detected as
hung.

The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in
many CPU machines, as servers commonly found nowadays). So, users that
plan to collect a kdump to investigate the hung tasks and narrow down
the deadlock definitely don't need the CPUs backtrace on dmesg/console,
which will delay the panic and pollute the log (crash tool would easily
grab all CPUs traces with 'bt -a' command).

Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung
task is detected. The current approach hence is almost as embedding a
policy in the kernel, by forcing the CPUs backtraces' dump (only) on
hung_task_panic.

This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
allow individual control. The new mechanism for dumping the CPUs
backtraces on hung task detection respects "hung_task_warnings" by not
dumping the traces in case there's no warnings left.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ec9dc9b

kernel: add panic_on_taint · db38d5c1

由 Rafael Aquini 提交于 6月 07, 2020

Analogously to the introduction of panic_on_warn, this patch introduces
a kernel option named panic_on_taint in order to provide a simple and
generic way to stop execution and catch a coredump when the kernel gets
tainted by any given flag.

This is useful for debugging sessions as it avoids having to rebuild the
kernel to explicitly add calls to panic() into the code sites that
introduce the taint flags of interest.

For instance, if one is interested in proceeding with a post-mortem
analysis at the point a given code path is hitting a bad page (i.e.
unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
by rebooting the kernel with 'panic_on_taint=0x20' amended to the
command line.

Another, perhaps less frequent, use for this option would be as a means
for assuring a security policy case where only a subset of taints, or no
single taint (in paranoid mode), is allowed for the running system.  The
optional switch 'nousertaint' is handy in this particular scenario, as
it will avoid userspace induced crashes by writes to sysctl interface
/proc/sys/kernel/tainted causing false positive hits for such policies.

[akpm@linux-foundation.org: tweak kernel-parameters.txt wording]
Suggested-by: NQian Cai <cai@lca.pw>
Signed-off-by: NRafael Aquini <aquini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db38d5c1

26 5月, 2020 2 次提交

docs: sysctl/kernel: document unaligned controls · 997c798e

由 Stephen Kitt 提交于 5月 15, 2020

This documents ignore-unaligned-usertrap, unaligned-dump-stack, and
unaligned-trap, based on arch/arc/kernel/unaligned.c,
arch/ia64/kernel/unaligned.c, and arch/parisc/kernel/unaligned.c.

While we're at it, integrate unaligned-memory-access.txt into the docs
tree.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200515212443.5012-1-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

997c798e

docs: sysctl/kernel: document ngroups_max · 17444d9b

由 Stephen Kitt 提交于 5月 18, 2020

This is a read-only export of NGROUPS_MAX.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200518145836.15816-1-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

17444d9b

18 5月, 2020 1 次提交

Revert "docs: sysctl/kernel: document ngroups_max" · fdb1b5e0

由 Jonathan Corbet 提交于 5月 18, 2020

This reverts commit 2f4c3306.

The changes here were fine, but there's a non-documentation change to
sysctl.c that makes messes elsewhere; those changes should have been done
independently.
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

fdb1b5e0

16 5月, 2020 1 次提交

docs: sysctl/kernel: document ngroups_max · 2f4c3306

由 Stephen Kitt 提交于 5月 15, 2020

This is a read-only export of NGROUPS_MAX, so this patch also changes
the declarations in kernel/sysctl.c to const.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200515160222.7994-1-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

2f4c3306

05 5月, 2020 1 次提交

docs: sysctl/kernel: document firmware_config · d75829c1

由 Stephen Kitt 提交于 4月 29, 2020

Based on the firmware fallback mechanisms documentation and the
implementation in drivers/base/firmware_loader/fallback.c.
Signed-off-by: NStephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200429205757.8677-2-steve@sk2.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

d75829c1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功