提交 · 39218ff4c625dbf2e68224024fe0acaa60bcd51a · openeuler / Kernel

08 4月, 2021 1 次提交

stack: Optionally randomize kernel stack offset each syscall · 39218ff4

由 Kees Cook 提交于 4月 01, 2021

This provides the ability for architectures to enable kernel stack base
address offset randomization. This feature is controlled by the boot
param "randomize_kstack_offset=on/off", with its default value set by
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

This feature is based on the original idea from the last public release
of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
All the credit for the original idea goes to the PaX team. Note that
the design and implementation of this upstream randomize_kstack_offset
feature differs greatly from the RANDKSTACK feature (see below).

Reasoning for the feature:

This feature aims to make harder the various stack-based attacks that
rely on deterministic stack structure. We have had many such attacks in
past (just to name few):

https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

As Linux kernel stack protections have been constantly improving
(vmap-based stack allocation with guard pages, removal of thread_info,
STACKLEAK), attackers have had to find new ways for their exploits
to work. They have done so, continuing to rely on the kernel's stack
determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
were not relevant. For example, the following recent attacks would have
been hampered if the stack offset was non-deterministic between syscalls:

https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
(page 70: targeting the pt_regs copy with linear stack overflow)

https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
(leaked stack address from one syscall as a target during next syscall)

The main idea is that since the stack offset is randomized on each system
call, it is harder for an attack to reliably land in any particular place
on the thread stack, even with address exposures, as the stack base will
change on the next syscall. Also, since randomization is performed after
placing pt_regs, the ptrace-based approach[1] to discover the randomized
offset during a long-running syscall should not be possible.

Design description:

During most of the kernel's execution, it runs on the "thread stack",
which is pretty deterministic in its structure: it is fixed in size,
and on every entry from userspace to kernel on a syscall the thread
stack starts construction from an address fetched from the per-cpu
cpu_current_top_of_stack variable. The first element to be pushed to the
thread stack is the pt_regs struct that stores all required CPU registers
and syscall parameters. Finally the specific syscall function is called,
with the stack being used as the kernel executes the resulting request.

The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and before the rest of the
thread stack is used during the syscall processing, and to change it every
time a process issues a syscall. The source of randomness is currently
architecture-defined (but x86 is using the low byte of rdtsc()). Future
improvements for different entropy sources is possible, but out of scope
for this patch. Further more, to add more unpredictability, new offsets
are chosen at the end of syscalls (the timing of which should be less
easy to measure from userspace than at syscall entry time), and stored
in a per-CPU variable, so that the life of the value does not stay
explicitly tied to a single task.

As suggested by Andy Lutomirski, the offset is added using alloca()
and an empty asm() statement with an output constraint, since it avoids
changes to assembly syscall entry code, to the unwinder, and provides
correct stack alignment as defined by the compiler.

In order to make this available by default with zero performance impact
for those that don't want it, it is boot-time selectable with static
branches. This way, if the overhead is not wanted, it can just be
left turned off with no performance impact.

The generated assembly for x86_64 with GCC looks like this:

...
ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
					    # 12380 <kstack_offset>
ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
ffffffff8100398c: 48 29 c4              sub %rax,%rsp
ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
...

As a result of the above stack alignment, this patch introduces about
5 bits of randomness after pt_regs is spilled to the thread stack on
x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
stack alignment). The amount of entropy could be adjusted based on how
much of the stack space we wish to trade for security.

My measure of syscall performance overhead (on x86_64):

lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
    randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
    randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds

So, roughly 0.9% overhead growth for a no-op syscall, which is very
manageable. And for people that don't want this, it's off by default.

There are two gotchas with using the alloca() trick. First,
compilers that have Stack Clash protection (-fstack-clash-protection)
enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
any dynamic stack allocations. While the randomization offset is
always less than a page, the resulting assembly would still contain
(unreachable!) probing routines, bloating the resulting assembly. To
avoid this, -fno-stack-clash-protection is unconditionally added to
the kernel Makefile since this is the only dynamic stack allocation in
the kernel (now that VLAs have been removed) and it is provably safe
from Stack Clash style attacks.

The second gotcha with alloca() is a negative interaction with
-fstack-protector*, in that it sees the alloca() as an array allocation,
which triggers the unconditional addition of the stack canary function
pre/post-amble which slows down syscalls regardless of the static
branch. In order to avoid adding this unneeded check and its associated
performance impact, architectures need to carefully remove uses of
-fstack-protector-strong (or -fstack-protector) in the compilation units
that use the add_random_kstack() macro and to audit the resulting stack
mitigation coverage (to make sure no desired coverage disappears). No
change is visible for this on x86 because the stack protector is already
unconditionally disabled for the compilation unit, but the change is
required on arm64. There is, unfortunately, no attribute that can be
used to disable stack protector for specific functions.

Comparison to PaX RANDKSTACK feature:

The RANDKSTACK feature randomizes the location of the stack start
(cpu_current_top_of_stack), i.e. including the location of pt_regs
structure itself on the stack. Initially this patch followed the same
approach, but during the recent discussions[2], it has been determined
to be of a little value since, if ptrace functionality is available for
an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
different offsets in the pt_regs struct, observe the cache behavior of
the pt_regs accesses, and figure out the random stack offset. Another
difference is that the random offset is stored in a per-cpu variable,
rather than having it be per-thread. As a result, these implementations
differ a fair bit in their implementation details and results, though
obviously the intent is similar.

[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
[2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
[3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.htmlCo-developed-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org

39218ff4

27 2月, 2021 1 次提交

lib: stackdepot: add support to disable stack depot · e1fdc403

由 Vijayanand Jitta 提交于 2月 25, 2021

Add a kernel parameter stack_depot_disable to disable stack depot.  So
that stack hash table doesn't consume any memory when stack depot is
disabled.

The use case is CONFIG_PAGE_OWNER without page_owner=on.  Without this
patch, stackdepot will consume the memory for the hashtable.  By default,
it's 8M which is never trivial.

With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off,
stack_depot_disable in kernel command line, we could save the wasted
memory for the hashtable.

[akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]

Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.orgSigned-off-by: NVinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NVijayanand Jitta <vjitta@codeaurora.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yogesh Lal <ylal@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e1fdc403

25 2月, 2021 1 次提交

mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON · fe2cce15

由 Vlastimil Babka 提交于 2月 24, 2021

The boot param and config determine the value of memcg_sysfs_enabled,
which is unused since commit 10befea9 ("mm: memcg/slab: use a single
set of kmem_caches for all allocations") as there are no per-memcg kmem
caches anymore.

Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe2cce15

17 2月, 2021 1 次提交

preempt: Introduce CONFIG_PREEMPT_DYNAMIC · 6ef869e0

由 Michal Hocko 提交于 1月 18, 2021

Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org

6ef869e0

15 2月, 2021 1 次提交

lib/vsprintf: no_hash_pointers prints all addresses as unhashed · 5ead723a

由 Timur Tabi 提交于 2月 14, 2021

If the no_hash_pointers command line parameter is set, then
printk("%p") will print pointers as unhashed, which is useful for
debugging purposes.  This change applies to any function that uses
vsprintf, such as print_hex_dump() and seq_buf_printf().

A large warning message is displayed if this option is enabled.
Unhashed pointers expose kernel addresses, which can be a security
risk.

Also update test_printf to skip the hashed pointer tests if the
command-line option is set.
Signed-off-by: NTimur Tabi <timur@kernel.org>
Acked-by: NPetr Mladek <pmladek@suse.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMarco Elver <elver@google.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210214161348.369023-4-timur@kernel.org

5ead723a

12 2月, 2021 1 次提交

Documentation/admin-guide: kernel-parameters: Update nohlt section · 3cae85f5

由 Florian Fainelli 提交于 2月 09, 2021

Update the documentation regarding "nohlt" and indicate that it is not
only for bugs, but can be useful to disable the architecture specific
sleep instructions. ARM, ARM64, SuperH and Microblaze all use
CONFIG_GENERIC_IDLE_POLL_SETUP which takes care of honoring the
"hlt"/"nohlt" parameters.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210209172349.2249596-1-f.fainelli@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

3cae85f5

09 2月, 2021 5 次提交

x86/apb_timer: Remove driver for deprecated platform · 1b79fc4f

由 Andy Shevchenko 提交于 1月 25, 2021

Intel Moorestown and Medfield are quite old Intel Atom based
32-bit platforms, which were in limited use in some Android phones,
tablets and consumer electronics more than eight years ago.

There are no bugs or problems ever reported outside from Intel
for breaking any of that platforms for years. It seems no real
users exists who run more or less fresh kernel on it. Commit
05f4434b ("ASoC: Intel: remove mfld_machine") is also in align
with this theory.

Due to above and to reduce a burden of supporting outdated drivers,
remove the support for outdated platforms completely.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1b79fc4f

arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line · f8da5752

由 Marc Zyngier 提交于 2月 08, 2021

In order to be able to disable Pointer Authentication  at runtime,
whether it is for testing purposes, or to work around HW issues,
let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA}
fields.

This is further mapped on the arm64.nopauth command-line alias.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NDavid Brazdil <dbrazdil@google.com>
Tested-by: NSrinivas Ramana <sramana@codeaurora.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-23-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

f8da5752

arm64: cpufeatures: Allow disabling of BTI from the command-line · 93ad55b7

由 Marc Zyngier 提交于 2月 08, 2021

In order to be able to disable BTI at runtime, whether it is
for testing purposes, or to work around HW issues, let's add
support for overriding the ID_AA64PFR1_EL1.BTI field.

This is further mapped on the arm64.nobti command-line alias.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NDavid Brazdil <dbrazdil@google.com>
Tested-by: NSrinivas Ramana <sramana@codeaurora.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-21-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

93ad55b7

arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 · 1945a067

由 Marc Zyngier 提交于 2月 08, 2021

Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't
that easy to understand, and it is likely that users would much
prefer write "kvm-arm.mode=nvhe", or "...=protected".

So here you go. This has the added advantage that we can now
always honor the "kvm-arm.mode=protected" option, even when
booting on a VHE system.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NDavid Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-18-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

1945a067

driver core: Add fw_devlink.strict kernel param · 19d0f5f6

由 Saravana Kannan 提交于 2月 05, 2021

This param allows forcing all dependencies to be treated as mandatory.
This will be useful for boards in which all optional dependencies like
IOMMUs and DMAs need to be treated as mandatory dependencies.
Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210205222644.2357303-4-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

19d0f5f6

29 1月, 2021 2 次提交

drivers: Remove CONFIG_OPROFILE support · f8408264

由 Viresh Kumar 提交于 1月 14, 2021

The "oprofile" user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

Remove kernel's old oprofile support.
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NRobert Richter <rric@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org> #RCU
Acked-by: NWilliam Cohen <wcohen@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NThomas Gleixner <tglx@linutronix.de>

f8408264

Documentation/admin-guide: kernel-parameters: update CMA entries · bc47190d

由 Randy Dunlap 提交于 1月 24, 2021

Add qualifying build option legend [CMA] to kernel boot options
that requirce CMA support to be enabled for them to be usable.

Also capitalize 'CMA' when it is used as an acronym.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
Link: https://lore.kernel.org/r/20210125043202.22399-1-rdunlap@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

bc47190d

28 1月, 2021 1 次提交

perf/intel: Remove Perfmon-v4 counter_freezing support · 3daa96d6

由 Peter Zijlstra 提交于 11月 10, 2020

Perfmon-v4 counter freezing is fundamentally broken; remove this default
disabled code to make sure nobody uses it.

The feature is called Freeze-on-PMI in the SDM, and if it would do that,
there wouldn't actually be a problem, *however* it does something subtly
different. It globally disables the whole PMU when it raises the PMI,
not when the PMI hits.

This means there's a window between the PMI getting raised and the PMI
actually getting served where we loose events and this violates the
perf counter independence. That is, a counting event should not result
in a different event count when there is a sampling event co-scheduled.

This is known to break existing software (RR).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

3daa96d6

27 1月, 2021 1 次提交

dmaengine: idxd: add module parameter to force disable of SVA · 03d939c7

由 Dave Jiang 提交于 1月 22, 2021

Add a module parameter that overrides the SVA feature enabling. This keeps
the driver in legacy mode even when intel_iommu=sm_on is set. In this mode,
the descriptor fields must be programmed with dma_addr_t from the Linux DMA
API for source, destination, and completion descriptors.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161134110457.4005461.13171197785259115852.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>

03d939c7

15 1月, 2021 1 次提交

IMA: define a builtin critical data measurement policy · 03cee168

由 Lakshmi Ramasubramanian 提交于 1月 07, 2021

Define a new critical data builtin policy to allow measuring
early kernel integrity critical data before a custom IMA policy
is loaded.

Update the documentation on kernel parameters to document
the new critical data builtin policy.
Signed-off-by: NLakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: NTyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>

03cee168

14 1月, 2021 1 次提交
- P
  locking/selftests: More granular debug_locks_verbose · 5831c0f7
  由 Peter Zijlstra 提交于 12月 09, 2020
```
Showing all tests all the time is tiresome.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
```
  5831c0f7
13 1月, 2021 1 次提交

x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery · b36b0fe9

由 David Woodhouse 提交于 1月 06, 2021

It's useful to be able to test non-vector event channel delivery, to make
sure Linux will work properly on older Xen which doesn't have it.

It's also useful for those working on Xen and Xen-compatible hypervisors,
because there are guest kernels still in active use which use PCI INTX
even when vector delivery is available.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210106153958.584169-4-dwmw2@infradead.orgSigned-off-by: NJuergen Gross <jgross@suse.com>

b36b0fe9

08 1月, 2021 1 次提交

Documentation/admin-guide: kernel-parameters: hyphenate comma-separated · 25942e5e

由 Randy Dunlap 提交于 12月 31, 2020

Hyphenate "comma separated" when it is used as a compound adjective.
hyphenated.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210101040831.4148-1-rdunlap@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

25942e5e

07 1月, 2021 3 次提交

torture: Throttle VERBOSE_TOROUT_*() output · 8a67a20b

由 Paul E. McKenney 提交于 11月 25, 2020

This commit adds kernel boot parameters torture.verbose_sleep_frequency
and torture.verbose_sleep_duration, which allow VERBOSE_TOROUT_*() output
to be throttled with periodic sleeps on large systems.
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

8a67a20b

refscale: Allow summarization of verbose output · e76506f0

由 Paul E. McKenney 提交于 11月 15, 2020

The refscale test prints enough per-kthread console output to provoke RCU
CPU stall warnings on large systems. This commit therefore allows this
output to be summarized. For example, the refscale.verbose_batched=32
boot parameter would causes only every 32nd line of output to be logged.
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

e76506f0

rcutorture: Test runtime toggling of CPUs' callback offloading · 2c4319bd

由 Paul E. McKenney 提交于 9月 23, 2020

Frederic Weisbecker is adding the ability to change the rcu_nocbs state
of CPUs at runtime, that is, to offload and deoffload their RCU callback
processing without the need to reboot.  As the old saying goes, "if it
ain't tested, it don't work", so this commit therefore adds prototype
rcutorture testing for this capability.
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>

2c4319bd

05 1月, 2021 3 次提交

rcu: Enable rcu_normal_after_boot unconditionally for RT · 36221e10

由 Julia Cartwright 提交于 12月 15, 2020

Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can
disrupt time-critical code in real-time applications.  However, there
is a portion of boot-time processing (presumably before any real-time
applications have started) where expedited RCU grace periods are the only
option.  And so it is that experience with the -rt patchset indicates that
PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot
kernel boot parameter.

This commit therefore makes the post-boot application environment safe
for real-time applications by making PREEMPT_RT systems disable the
rcupdate.rcu_normal_after_boot kernel boot parameter and acting as
if this parameter had been set.  This means that post-boot calls to
synchronize_rcu_expedited() will be treated as if they were instead
calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding
disrupting real-time applications.
Suggested-by: NLuiz Capitulino <lcapitulino@redhat.com>
Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: NJulia Cartwright <julia@ni.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Update kernel-parameters.txt accordingly. ]
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

36221e10

rcu: Unconditionally use rcuc threads on PREEMPT_RT · 8b9a0ecc

由 Scott Wood 提交于 12月 15, 2020

PREEMPT_RT systems have long used the rcutree.use_softirq kernel
boot parameter to avoid use of RCU_SOFTIRQ handlers, which can disrupt
real-time applications by invoking callbacks during return from interrupts
that arrived while executing time-critical code.  This kernel boot
parameter instead runs RCU core processing in an 'rcuc' kthread, thus
allowing the scheduler to do its job of avoiding disrupting time-critical
code.

This commit therefore disables the rcutree.use_softirq kernel boot
parameter on PREEMPT_RT systems, thus forcing such systems to do RCU
core processing in 'rcuc' kthreads.  This approach has long been in
use by users of the -rt patchset, and there have been no complaints.
There is therefore no way for the system administrator to override this
choice, at least without modifying and rebuilding the kernel.
Signed-off-by: NScott Wood <swood@redhat.com>
[bigeasy: Reword commit message]
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Update kernel-parameters.txt accordingly. ]
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

8b9a0ecc

doc: Remove obsolete rcutree.rcu_idle_lazy_gp_delay boot parameter · 2252ec14

由 Paul E. McKenney 提交于 12月 10, 2020

This commit removes documentation for the rcutree.rcu_idle_lazy_gp_delay
kernel boot parameter given that this parameter no longer exists.

Fixes: 77a40f97 ("rcu: Remove kfree_rcu() special casing and lazy-callback handling")
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

2252ec14

10 12月, 2020 1 次提交

USB: UAS: introduce a quirk to set no_write_same · 8010622c

由 Oliver Neukum 提交于 12月 09, 2020

UAS does not share the pessimistic assumption storage is making that
devices cannot deal with WRITE_SAME. A few devices supported by UAS,
are reported to not deal well with WRITE_SAME. Those need a quirk.

Add it to the device that needs it.
Reported-by: NDavid C. Partridge <david.partridge@perdrix.co.uk>
Signed-off-by: NOliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201209152639.9195-1-oneukum@suse.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8010622c

04 12月, 2020 1 次提交

KVM: arm64: Add kvm-arm.mode early kernel parameter · d8b369c4

由 David Brazdil 提交于 12月 02, 2020

Add an early parameter that allows users to select the mode of operation
for KVM/arm64.

For now, the only supported value is "protected". By passing this flag
users opt into the hypervisor placing additional restrictions on the
host kernel. These allow the hypervisor to spawn guests whose state is
kept private from the host. Restrictions will include stage-2 address
translation to prevent host from accessing guest memory, filtering its
SMC calls, etc.

Without this parameter, the default behaviour remains selecting VHE/nVHE
based on hardware support and CONFIG_ARM64_VHE.
Signed-off-by: NDavid Brazdil <dbrazdil@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com

d8b369c4

01 12月, 2020 1 次提交

Documentation/admin-guide: mark memmap parameter is supported by a few architectures · 4c8e3de4

由 Barry Song 提交于 11月 29, 2020

early_param memmap is only implemented on X86, MIPS and XTENSA. To avoid
wasting users’ time on trying this on platform like ARM, mark it clearly.
Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201128195121.2556-1-song.bao.hua@hisilicon.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

4c8e3de4

25 11月, 2020 1 次提交

iommu/vt-d: Cleanup after converting to dma-iommu ops · 58a8bb39

由 Lu Baolu 提交于 11月 24, 2020

Some cleanups after converting the driver to use dma-iommu ops.
- Remove nobounce option;
- Cleanup and simplify the path in domain mapping.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Tested-by: NLogan Gunthorpe <logang@deltatee.com>
Link: https://lore.kernel.org/r/20201124082057.2614359-8-baolu.lu@linux.intel.comSigned-off-by: NWill Deacon <will@kernel.org>

58a8bb39

19 11月, 2020 2 次提交

powerpc/64s: flush L1D after user accesses · 9a32a7e7

由 Nicholas Piggin 提交于 11月 17, 2020

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.

This is part of the fix for CVE-2020-4788.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9a32a7e7

powerpc/64s: flush L1D on kernel entry · f7964378

由 Nicholas Piggin 提交于 11月 17, 2020

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.

This is part of the fix for CVE-2020-4788.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f7964378

17 11月, 2020 1 次提交

x86/cpu/intel: Add a nosgx kernel parameter · 38853a30

由 Jarkko Sakkinen 提交于 11月 13, 2020

Add a kernel parameter to disable SGX kernel support and document it.

 [ bp: Massage. ]
Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Acked-by: NJethro Beekman <jethro@fortanix.com>
Tested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-9-jarkko@kernel.org

38853a30

31 10月, 2020 1 次提交

Documentation: Update paths of Samsung S3C machine files · 0f12999e

由 Krzysztof Kozlowski 提交于 9月 11, 2020

Documentation references Samsung S3C24xx and S3C64xx machine files in
multiple places but the files were traveling around the kernel multiple
times.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200911143343.498-1-krzk@kernel.org

0f12999e

23 10月, 2020 1 次提交

Documentation: add xen.fifo_events kernel parameter description · 1a89c1dc

由 Juergen Gross 提交于 10月 22, 2020

The kernel boot parameter xen.fifo_events isn't listed in
Documentation/admin-guide/kernel-parameters.txt. Add it.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-6-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

1a89c1dc

20 10月, 2020 1 次提交

xen/events: defer eoi in case of excessive number of events · e99502f7

由 Juergen Gross 提交于 9月 07, 2020

In case rogue guests are sending events at high frequency it might
happen that xen_evtchn_do_upcall() won't stop processing events in
dom0. As this is done in irq handling a crash might be the result.

In order to avoid that, delay further inter-domain events after some
time in xen_evtchn_do_upcall() by forcing eoi processing into a
worker on the same cpu, thus inhibiting new events coming in.

The time after which eoi processing is to be delayed is configurable
via a new module parameter "event_loop_timeout" which specifies the
maximum event loop time in jiffies (default: 2, the value was chosen
after some tests showing that a value of 2 was the lowest with an
only slight drop of dom0 network throughput while multiple guests
performed an event storm).

How long eoi processing will be delayed can be specified via another
parameter "event_eoi_delay" (again in jiffies, default 10, again the
value was chosen after testing with different delay values).

This is part of XSA-332.

Cc: stable@vger.kernel.org
Reported-by: NJulien Grall <julien@xen.org>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
Reviewed-by: NWei Liu <wl@xen.org>

e99502f7

17 10月, 2020 1 次提交

lib, include/linux: add usercopy failure capability · 2c739ced

由 Albert van der Linde 提交于 10月 15, 2020

Patch series "add fault injection to user memory access", v3.

The goal of this series is to improve testing of fault-tolerance in usages
of user memory access functions, by adding support for fault injection.

syzkaller/syzbot are using the existing fault injection modes and will use
this particular feature also.

The first patch adds failure injection capability for usercopy functions.
The second changes usercopy functions to use this new failure capability
(copy_from_user, ...).  The third patch adds get/put/clear_user failures
to x86.

This patch (of 3):

Add a failure injection capability to improve testing of fault-tolerance
in usages of user memory access functions.

Add CONFIG_FAULT_INJECTION_USERCOPY to enable faults in usercopy
functions.  The should_fail_usercopy function is to be called by these
functions (copy_from_user, get_user, ...) in order to fail or not.
Signed-off-by: NAlbert van der Linde <alinde@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NAlexander Potapenko <glider@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20200831171733.955393-1-alinde@google.com
Link: http://lkml.kernel.org/r/20200831171733.955393-2-alinde@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c739ced

06 10月, 2020 1 次提交

dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> · 0b1abd1f

由 Christoph Hellwig 提交于 9月 11, 2020

Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0b1abd1f

25 9月, 2020 3 次提交

Documentation: kernel-parameters: clarify "module." parameters · 307e3ee9

由 Randy Dunlap 提交于 9月 15, 2020

The command-line parameters "dyndbg" and "async_probe" are not
parameters for kernel/module.c but instead they are for the
module that is being loaded. Try to make that distinction in the
help text.

OTOH, "module.sig_enforce" is handled as a parameter of kernel/module.c
so "module." is correct for it.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/67d40b6d-c073-a3bf-cbb6-6cad941cceeb@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

307e3ee9

Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev" · 6b99e6e6

由 Randy Dunlap 提交于 9月 17, 2020

Remove use of "rdev" from blockdev/ramdisk.rst and update
admin-guide/kernel-parameters.txt.

"rdev" is considered antiquated, ancient, archaic, obsolete, deprecated
{choose any or all}.

"rdev" was removed from util-linux in 2010:
  https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=a3e40c14651fccf18e7954f081e601389baefe3fSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Karel Zak <kzak@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Martin Mares <mj@ucw.cz>
Cc: linux-video@atrey.karlin.mff.cuni.cz
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20200918015640.8439-3-rdunlap@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

6b99e6e6

Documentation/admin-guide: kernel-parameters: capitalize Korina · 497de97e

由 Randy Dunlap 提交于 9月 17, 2020

Fix typo, capitalize Korina proper noun.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20200918054722.28713-1-rdunlap@infradead.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

497de97e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功