提交 · b3000e2133d878e586416e440642ca82d234c6fb · openeuler / Kernel

01 7月, 2022 1 次提交

arm64: Add the arm64.nosme command line option · b3000e21

由 Marc Zyngier 提交于 6月 30, 2022

In order to be able to completely disable SME even if the HW
seems to support it (most likely because the FW is broken),
move the SME setup into the EL2 finalisation block, and
use a new idreg override to deal with it.

Note that we also nuke id_aa64smfr0_el1 as a byproduct.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-8-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

b3000e21

09 6月, 2022 1 次提交

KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE · cde5042a

由 Will Deacon 提交于 6月 09, 2022

Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.

Cc: David Brazdil <dbrazdil@google.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org

cde5042a

23 5月, 2022 2 次提交

KEYS: trusted: Introduce support for NXP CAAM-based trusted keys · e9c5048c

由 Ahmad Fatoum 提交于 5月 13, 2022

The Cryptographic Acceleration and Assurance Module (CAAM) is an IP core
built into many newer i.MX and QorIQ SoCs by NXP.

The CAAM does crypto acceleration, hardware number generation and
has a blob mechanism for encapsulation/decapsulation of sensitive material.

This blob mechanism depends on a device specific random 256-bit One Time
Programmable Master Key that is fused in each SoC at manufacturing
time. This key is unreadable and can only be used by the CAAM for AES
encryption/decryption of user data.

This makes it a suitable backend (source) for kernel trusted keys.

Previous commits generalized trusted keys to support multiple backends
and added an API to access the CAAM blob mechanism. Based on these,
provide the necessary glue to use the CAAM for trusted keys.
Reviewed-by: NDavid Gstir <david@sigma-star.at>
Reviewed-by: NPankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
Tested-by: NTim Harvey <tharvey@gateworks.com>
Tested-by: NMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
Tested-by: NPankaj Gupta <pankaj.gupta@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E)
Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP
Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>

e9c5048c

KEYS: trusted: allow use of kernel RNG for key material · fcd7c269

由 Ahmad Fatoum 提交于 5月 13, 2022

The two existing trusted key sources don't make use of the kernel RNG,
but instead let the hardware doing the sealing/unsealing also
generate the random key material. However, both users and future
backends may want to place less trust into the quality of the trust
source's random number generator and instead reuse the kernel entropy
pool, which can be seeded from multiple entropy sources.

Make this possible by adding a new trusted.rng parameter,
that will force use of the kernel RNG. In its absence, it's up
to the trust source to decide, which random numbers to use,
maintaining the existing behavior.
Suggested-by: NJarkko Sakkinen <jarkko@kernel.org>
Acked-by: NSumit Garg <sumit.garg@linaro.org>
Acked-by: NPankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: NDavid Gstir <david@sigma-star.at>
Reviewed-by: NPankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
Tested-by: NPankaj Gupta <pankaj.gupta@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E)
Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP
Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>

fcd7c269

21 5月, 2022 1 次提交

x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data · 8cb861e9

由 Pawan Gupta 提交于 5月 19, 2022

Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst.

These vulnerabilities are broadly categorized as:

Device Register Partial Write (DRPW):
  Some endpoint MMIO registers incorrectly handle writes that are
  smaller than the register size. Instead of aborting the write or only
  copying the correct subset of bytes (for example, 2 bytes for a 2-byte
  write), more bytes than specified by the write transaction may be
  written to the register. On some processors, this may expose stale
  data from the fill buffers of the core that created the write
  transaction.

Shared Buffers Data Sampling (SBDS):
  After propagators may have moved data around the uncore and copied
  stale data into client core fill buffers, processors affected by MFBDS
  can leak data from the fill buffer.

Shared Buffers Data Read (SBDR):
  It is similar to Shared Buffer Data Sampling (SBDS) except that the
  data is directly read into the architectural software-visible state.

An attacker can use these vulnerabilities to extract data from CPU fill
buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill
buffers using the VERW instruction before returning to a user or a
guest.

On CPUs not affected by MDS and TAA, user application cannot sample data
from CPU fill buffers using MDS or TAA. A guest with MMIO access can
still use DRPW or SBDR to extract data architecturally. Mitigate it with
VERW instruction to clear fill buffers before VMENTER for MMIO capable
guests.

Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control
the mitigation.
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

8cb861e9

20 5月, 2022 3 次提交

x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions · fa6dae5d

由 Hans de Goede 提交于 5月 19, 2022

Some firmware supplies PCI host bridge _CRS that includes address space
unusable by PCI devices, e.g., space occupied by host bridge registers or
used by hidden PCI devices.

To avoid this unusable space, Linux currently excludes E820 reserved
regions from _CRS windows; see 4dc2287c ("x86: avoid E820 regions when
allocating address space").

However, this use of E820 reserved regions to clip things out of _CRS is
not supported by ACPI, UEFI, or PCI Firmware specs, and some systems have
E820 reserved regions that cover the entire memory window from _CRS.
4dc2287c clips the entire window, leaving no space for hot-added or
uninitialized PCI devices.

For example, from a Lenovo IdeaPad 3 15IIL 81WE:

  BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
  pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
  pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]

Future patches will add quirks to enable/disable E820 clipping
automatically.

Add a "pci=no_e820" kernel command line option to disable clipping with
E820 reserved regions.  Also add a matching "pci=use_e820" option to enable
clipping with E820 reserved regions if that has been disabled by default by
further patches in this patch-set.

Both options taint the kernel because they are intended for debugging and
workaround purposes until a quirk can set them automatically.

[bhelgaas: commit log, add printk]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3
Link: https://lore.kernel.org/r/20220519152150.6135-2-hdegoede@redhat.comSigned-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>

fa6dae5d

driver core: Extend deferred probe timeout on driver registration · 2b28a1a8

由 Saravana Kannan 提交于 4月 29, 2022

The deferred probe timer that's used for this currently starts at
late_initcall and runs for driver_deferred_probe_timeout seconds. The
assumption being that all available drivers would be loaded and
registered before the timer expires. This means, the
driver_deferred_probe_timeout has to be pretty large for it to cover the
worst case. But if we set the default value for it to cover the worst
case, it would significantly slow down the average case. For this
reason, the default value is set to 0.

Also, with CONFIG_MODULES=y and the current default values of
driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
drivers will cause their consumer devices to always defer their probes.
This is because device links created by fw_devlink defer the probe even
before the consumer driver's probe() is called.

Instead of a fixed timeout, if we extend an unexpired deferred probe
timer on every successful driver registration, with the expectation more
modules would be loaded in the near future, then the default value of
driver_deferred_probe_timeout only needs to be as long as the worst case
time difference between two consecutive module loads.

So let's implement that and set the default value to 10 seconds when
CONFIG_MODULES=y.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kevin Hilman <khilman@kernel.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Cc: linux-gpio@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Reviewed-by: NMark Brown <broonie@kernel.org>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220429220933.1350374-1-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b28a1a8

driver core: Add "*" wildcard support to driver_async_probe cmdline param · f79f662e

由 Saravana Kannan 提交于 5月 03, 2022

There's currently no way to use driver_async_probe kernel cmdline param
to enable default async probe for all drivers.  So, add support for "*"
to match with all driver names.  When "*" is used, all other drivers
listed in driver_async_probe are drivers that will NOT match the "*".

For example:
* driver_async_probe=drvA,drvB,drvC
  drvA, drvB and drvC do asynchronous probing.

* driver_async_probe=*
  All drivers do asynchronous probing except those that have set
  PROBE_FORCE_SYNCHRONOUS flag.

* driver_async_probe=*,drvA,drvB,drvC
  All drivers do asynchronous probing except drvA, drvB, drvC and those
  that have set PROBE_FORCE_SYNCHRONOUS flag.

Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220504005344.117803-1-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f79f662e

17 5月, 2022 1 次提交

arm64: kdump: Do not allocate crash low memory if not needed · 8f0f104e

由 Zhen Lei 提交于 5月 11, 2022

When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low"
memory is not required in the following corner cases:
1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means
that the devices can access any memory.
2. If the system memory is small, the crash high memory may be allocated
from the DMA zones. If that happens, there's no need to allocate
another crash low memory because there's already one.

Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether
the 'high' memory is allocated above DMA zones. Note: when both
CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical
memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Acked-by: NBaoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

8f0f104e

14 5月, 2022 1 次提交

mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing · 9c54c522

由 Muchun Song 提交于 5月 13, 2022

Use kstrtobool rather than open coding "on" and "off" parsing in
mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of
parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off".

Link: https://lkml.kernel.org/r/20220512041142.39501-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

9c54c522

13 5月, 2022 1 次提交

x86/speculation: Add missing srbds=off to the mitigations= help text · 553b0cb3

由 Xiao Yang 提交于 5月 13, 2022

The mitigations= cmdline option help text misses the srbds=off option.
Add it.

  [ bp: Add a commit message. ]
Signed-off-by: NXiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220513101637.216487-1-yangx.jy@fujitsu.com

553b0cb3

12 5月, 2022 1 次提交

rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT · 28b3ae42

由 Uladzislau Rezki 提交于 2月 16, 2022

Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds.  However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning.  Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.

Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise.  It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.

[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]
Signed-off-by: NUladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

28b3ae42

10 5月, 2022 1 次提交

Documentation: drop more IDE boot options and ide-cd.rst · 4a840d5f

由 Randy Dunlap 提交于 4月 23, 2022

Drop ide-* command line options.
Drop cdrom/ide-cd.rst documentation.

Fixes: 898ee22c ("Drop Documentation/ide/")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: NPhillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20220424033701.7988-1-rdunlap@infradead.org
[jc: also deleted reference from cdrom/index.rst]
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

4a840d5f

09 5月, 2022 1 次提交

doc: admin-guide: Update libata kernel parameters · fa82cabb

由 Damien Le Moal 提交于 3月 18, 2022

Cleanup the text text describing the libata.force boot parameter and
update the list of the values to include all supported horkage and link
flag that can be forced.
Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>

fa82cabb

08 5月, 2022 1 次提交

docs: kdump: Update the crashkernel description for arm64 · 5832f1ae

由 Zhen Lei 提交于 5月 06, 2022

Now arm64 has added support for "crashkernel=X,high" and
"crashkernel=Y,low". Unlike x86, crash low memory is not allocated if
"crashkernel=Y,low" is not specified.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Acked-by: NBaoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220506114402.365-7-thunder.leizhen@huawei.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

5832f1ae

05 5月, 2022 1 次提交

ima: define a new template field named 'd-ngv2' and templates · 989dc725

由 Mimi Zohar 提交于 12月 23, 2021

In preparation to differentiate between unsigned regular IMA file
hashes and fs-verity's file digests in the IMA measurement list,
define a new template field named 'd-ngv2'.

Also define two new templates named 'ima-ngv2' and 'ima-sigv2', which
include the new 'd-ngv2' field.
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>

989dc725

04 5月, 2022 1 次提交

srcu: Automatically determine size-transition strategy at boot · a57ffb3c

由 Paul E. McKenney 提交于 1月 31, 2022

This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems).  A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.
Co-developed-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

a57ffb3c

29 4月, 2022 1 次提交

mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* · 47010c04

由 Muchun Song 提交于 4月 28, 2022

The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize". In this patch , cheanup configs to make code more
expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

47010c04

21 4月, 2022 1 次提交

kernel/smp: Provide boot-time timeout for CSD lock diagnostics · 3791a223

由 Paul E. McKenney 提交于 2月 28, 2022

Debugging of problems involving insanely long-running SMI handlers
proceeds better if the CSD-lock timeout can be adjusted.  This commit
therefore provides a new smp.csd_lock_timeout kernel boot parameter
that specifies the timeout in milliseconds.  The default remains at the
previously hard-coded value of five seconds.

[ paulmck: Apply feedback from Juergen Gross. ]

Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

3791a223

16 4月, 2022 4 次提交

docs/admin: alphabetize parts of kernel-parameters.txt (part 2) · 389cfd96

由 Randy Dunlap 提交于 4月 02, 2022

Alphabetize several of the kernel boot parameters in
kernel-parameters.txt.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

389cfd96

Docs/admin: alphabetize some kernel-parameters (part 1) · d2fc83c1

由 Randy Dunlap 提交于 4月 02, 2022

Move some out-of-place kernel parameters into their correct
locations.

Move one out-of-order keyword/legend in kernel-parameters.rst.

Add some missing keyword legends in kernel-parameters.rst:
  HIBERNATION HYPER_V
and drop some obsolete/removed keyword legends:
 EIDE IOSCHED OSS TS XT

Correct the location of the setup.h file.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

d2fc83c1

Docs: admin/kernel-parameters: edit a few boot options · 59bdbbd5

由 Randy Dunlap 提交于 4月 02, 2022

Clean up some of admin-guide/kernel-parameters.txt:

a. "smt" should be "smt=" (S390)
b. (dropped)
c. Sparc supports the vdso= boot option
d. make the tp_printk options (2) formatting similar to other options
   by adding spacing
e. add "trace_clock=" with a reference to Documentation/trace/ftrace.rst
f. use [IA-64] as documented instead of [ia64]
g. fix formatting and text for test_suspend=
h. fix formatting for swapaccount=
i. fix formatting and grammar for video.brightness_switch_enabled=
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-ia64@vger.kernel.org
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

59bdbbd5

x86/efi: Remove references of EFI earlyprintk from documentation · 82850028

由 Akihiko Odaki 提交于 3月 21, 2022

x86 EFI earlyprink was removed with commit 69c1f396 ("efi/x86:
Convert x86 EFI earlyprintk into generic earlycon implementation").
Signed-off-by: NAkihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

82850028

12 4月, 2022 3 次提交

rcu-tasks: Print pre-stall-warning informational messages · f2539003

由 Paul E. McKenney 提交于 2月 25, 2022

RCU-tasks stall-warning messages are printed after the grace period is ten
minutes old. Unfortunately, most of us will have rebooted the system in
response to an apparently-hung command long before the ten minutes is up,
and will thus see what looks to be a silent hang.

This commit therefore adds pr_info() messages that are printed earlier.
These should avoid being classified as errors, but should give impatient
users a hint. These are controlled by new rcupdate.rcu_task_stall_info
and rcupdate.rcu_task_stall_info_mult kernel-boot parameters. The former
defines the initial delay in jiffies (defaulting to 10 seconds) and the
latter defines the multiplier (defaulting to 3). Thus, by default, the
first message will appear 10 seconds into the RCU-tasks grace period,
the second 40 seconds in, and the third 160 seconds in. There would be
a fourth at 640 seconds in, but the stall warning message appears 600
seconds in, and once a stall warning is printed for a given grace period,
no further informational messages are printed.
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

f2539003

srcu: Add contention-triggered addition of srcu_node tree · 9f2e91d9

由 Paul E. McKenney 提交于 1月 27, 2022

This commit instruments the acquisitions of the srcu_struct structure's
->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL
to SRCU_SIZE_BIG when sufficient contention is experienced.  The
instrumentation counts the number of trylock failures within the confines
of a single jiffy.  If that number exceeds the value specified by the
srcutree.small_contention_lim kernel boot parameter (which defaults to
100), and if the value specified by the srcutree.convert_to_big kernel
boot parameter has the 0x10 bit set (defaults to 0), then a transition
will be automatically initiated.

By default, there will never be any transitions, so that none of the
srcu_struct structures ever gains an srcu_node array.

The useful values for srcutree.convert_to_big are:

0x00:  Never convert.
0x01:  Always convert at init_srcu_struct() time.
0x02:  Convert when rcutorture prints its first round of statistics.
0x03:  Decide conversion approach at boot given system size.
0x10:  Convert if contention is encountered.
0x12:  Convert if contention is encountered or when rcutorture prints
        its first round of statistics, whichever comes first.

The value 0x11 acts the same as 0x01 because the conversion happens
before there is any chance of contention.

[ paulmck: Apply "static" feedback from kernel test robot. ]
Co-developed-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

9f2e91d9

srcu: Add boot-time control over srcu_node array allocation · c69a00a1

由 Paul E. McKenney 提交于 1月 25, 2022

This commit adds an srcu_tree.convert_to_big kernel parameter that either
refuses to convert at all (0), converts immediately at init_srcu_struct()
time (1), or lets rcutorture convert it (2).  An addition contention-based
dynamic conversion choice will be added, along with documentation.

[ paulmck: Apply callback-scanning feedback from Neeraj Upadhyay. ]
Co-developed-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NNeeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>

c69a00a1

07 4月, 2022 1 次提交

x86/sev: Add a sev= cmdline option · ba37a143

由 Michael Roth 提交于 3月 07, 2022

For debugging purposes it is very useful to have a way to see the full
contents of the SNP CPUID table provided to a guest. Add an sev=debug
kernel command-line option to do so.

Also introduce some infrastructure so that additional options can be
specified via sev=option1[,option2] over time in a consistent manner.

  [ bp: Massage, simplify string parsing. ]
Suggested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NMichael Roth <michael.roth@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-41-brijesh.singh@amd.com

ba37a143

04 4月, 2022 6 次提交

x86/cpu: Remove "noclflush" · f8858b5e

由 Borislav Petkov 提交于 1月 27, 2022

Not really needed anymore and there's clearcpuid=.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-7-bp@alien8.de

f8858b5e

x86/cpu: Remove "noexec" · 76ea0025

由 Borislav Petkov 提交于 1月 27, 2022

It doesn't make any sense to disable non-executable mappings -
security-wise or else.

So rip out that switch and move the remaining code into setup.c and
delete setup_nx.c
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-6-bp@alien8.de

76ea0025

x86/cpu: Remove "nosmep" · 385d2ae0

由 Borislav Petkov 提交于 1月 27, 2022

There should be no need to disable SMEP anymore.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-5-bp@alien8.de

385d2ae0

x86/cpu: Remove CONFIG_X86_SMAP and "nosmap" · dbae0a93

由 Borislav Petkov 提交于 1月 27, 2022

Those were added as part of the SMAP enablement but SMAP is currently
an integral part of kernel proper and there's no need to disable it
anymore.

Rip out that functionality. Leave --uaccess default on for objtool as
this is what objtool should do by default anyway.

If still needed - clearcpuid=smap.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-4-bp@alien8.de

dbae0a93

x86/cpu: Remove "nosep" · c949110e

由 Borislav Petkov 提交于 1月 27, 2022

That chicken bit was added by

  4f886511 ("[PATCH] i386: allow disabling X86_FEATURE_SEP at boot")

but measuring int80 vsyscall performance on 32-bit doesn't matter
anymore.

If still needed, one can boot with

  clearcpuid=sep

to disable that feature for testing.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-3-bp@alien8.de

c949110e

x86/cpu: Allow feature bit names from /proc/cpuinfo in clearcpuid= · 1625c833

由 Borislav Petkov 提交于 1月 27, 2022

Having to give the X86_FEATURE array indices in order to disable a
feature bit for testing is not really user-friendly. So accept the
feature bit names too.

Some feature bits don't have names so there the array indices are still
accepted, of course.

Clearing CPUID flags is not something which should be done in production
so taint the kernel too.

An exemplary cmdline would then be something like:

  clearcpuid=de,440,smca,succory,bmi1,3dnow

("succory" is wrong on purpose). And it says:

  [   ... ] Clearing CPUID bits: de 13:24 smca (unknown: succory) bmi1 3dnow

  [ Fix CONFIG_X86_FEATURE_NAMES=n build error as reported by the 0day
    robot: https://lore.kernel.org/r/202203292206.ICsY2RKX-lkp@intel.com ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-2-bp@alien8.de

1625c833

25 3月, 2022 1 次提交

random: treat bootloader trust toggle the same way as cpu trust toggle · d97c68d1

由 Jason A. Donenfeld 提交于 3月 22, 2022

If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND.
But, the user can disable (or enable) this behavior by setting
`random.trust_cpu=0/1` on the kernel command line. This allows system
builders to do reasonable things while avoiding howls from tinfoil
hatters. (Or vice versa.)

CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards
the seed passed via EFI or device tree, which might come from RDRAND or
a TPM or somewhere else. In order to allow distros to more easily enable
this while avoiding those same howls (or vice versa), this commit adds
the corresponding `random.trust_bootloader=0/1` toggle.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Graham Christensen <graham@grahamc.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Link: https://github.com/NixOS/nixpkgs/pull/165355Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

d97c68d1

24 3月, 2022 2 次提交

panic: move panic_print before kmsg dumpers · f953f140

由 Guilherme G. Piccoli 提交于 3月 23, 2022

The panic_print setting allows users to collect more information in a
panic event, like memory stats, tasks, CPUs backtraces, etc.  This is an
interesting debug mechanism, but currently the print event happens *after*
kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with
the panic_print extra information.

This patch changes that in 2 steps:

(a) The panic_print setting allows to replay the existing kernel log
    buffer to the console (bit 5), besides the extra information dump.
    This functionality makes sense only at the end of the panic()
    function.  So, we hereby allow to distinguish the two situations by a
    new boolean parameter in the function panic_print_sys_info().

(b) With the above change, we can safely call panic_print_sys_info()
    before kmsg_dump(), allowing to dump the extra information when using
    pstore or other kmsg dumpers.

The additional messages from panic_print could overwrite the oldest
messages when the buffer is full.  The only reasonable solution is to use
a large enough log buffer, hence we added an advice into the kernel
parameters documentation about that.

Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.comSigned-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Acked-by: NBaoquan He <bhe@redhat.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Reviewed-by: NSergey Senozhatsky <senozhatsky@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f953f140

panic: add option to dump all CPUs backtraces in panic_print · 8d470a45

由 Guilherme G. Piccoli 提交于 3月 23, 2022

Currently the "panic_print" parameter/sysctl allows some interesting debug
information to be printed during a panic event. This is useful for
example in cases the user cannot kdump due to resource limits, or if the
user collects panic logs in a serial output (or pstore) and prefers a fast
reboot instead of a kdump.

Happens that currently there's no way to see all CPUs backtraces in a
panic using "panic_print" on architectures that support that. We do have
"oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
functionality, they are orthogonal in nature: "panic_print" is a panic
tuning (and we have panics without oopses, like direct calls to panic() or
maybe other paths that don't go through oops_enter() function), and the
original purpose of "oops_all_cpu_backtrace" is to provide more
information on oopses for cases in which the users desire to continue
running the kernel even after an oops, i.e., used in non-panic scenarios.

So, we hereby introduce an additional bit for "panic_print" to allow
dumping the CPUs backtraces during a panic event.

Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.comSigned-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: NFeng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d470a45

23 3月, 2022 1 次提交

mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page · e7d32485

由 Muchun Song 提交于 3月 22, 2022

Patch series "Free the 2nd vmemmap page associated with each HugeTLB
page", v7.

This series can minimize the overhead of struct page for 2MB HugeTLB
pages significantly.  It further reduces the overhead of struct page by
12.5% for a 2MB HugeTLB compared to the previous approach, which means
2GB per 1TB HugeTLB.  It is a nice gain.  Comments and reviews are
welcome.  Thanks.

The main implementation and details can refer to the commit log of patch
1.  In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.

	+------------------+-----------------------+
	|       APIs       | head page | tail page |
	+------------------+-----------+-----------+
	|    PageHead()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|    PageTail()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|  PageCompound()  |     N     |     N     |
	+------------------+-----------+-----------+
	|  compound_head() |     Y     |     N     |
	+------------------+-----------+-----------+

	Y: Overhead is increased.
	N: Overhead is _NOT_ increased.

It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off".  But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()).  So I believe that Matthew Wilcox's folio series
will help with this.

The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.

I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).

For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%.  For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%.  Most pages are the former.  I do not think the overhead is
significant since the overhead of compound_head() itself is low.

This patch (of 5):

This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly.  It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).

After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.

     HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | -------------> |     1     |
 |           |                     +-----------+                +-----------+
 |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
 |           |                     +-----------+                   | | | | |
 |           |                     |     3     | ------------------+ | | | |
 |           |                     +-----------+                     | | | |
 |           |                     |     4     | --------------------+ | | |
 |    2MB    |                     +-----------+                       | | |
 |           |                     |     5     | ----------------------+ | |
 |           |                     +-----------+                         | |
 |           |                     |     6     | ------------------------+ |
 |           |                     +-----------+                           |
 |           |                     |     7     | --------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.

    HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
 |           |                     +-----------+                  | | | | | |
 |           |                     |     2     | -----------------+ | | | | |
 |           |                     +-----------+                    | | | | |
 |           |                     |     3     | -------------------+ | | | |
 |           |                     +-----------+                      | | | |
 |           |                     |     4     | ---------------------+ | | |
 |    2MB    |                     +-----------+                        | | |
 |           |                     |     5     | -----------------------+ | |
 |           |                     +-----------+                          | |
 |           |                     |     6     | -------------------------+ |
 |           |                     +-----------+                            |
 |           |                     |     7     | ---------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0).  In other words, there are more than one page
struct with PG_head associated with each HugeTLB page.  We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs.  We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.

The following code snippet describes how to distinguish between real and
fake head page struct.

	if (test_bit(PG_head, &page->flags)) {
		unsigned long head = READ_ONCE(page[1].compound_head);

		if (head & 1) {
			if (head == (unsigned long)page + 1)
				==> head page struct
			else
				==> tail page struct
		} else
			==> head page struct
	}

We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.

[songmuchun@bytedance.com: restore lost comment changes]

Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7d32485

17 3月, 2022 1 次提交

docs/kernel-parameters: update description of mem= · 75c05fab

由 Mike Rapoport 提交于 3月 10, 2022

The existing description of mem= does not cover all the cases and
differences between how architectures treat it.

Extend the description to match the code.
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220310082736.1346366-1-rppt@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

75c05fab

12 3月, 2022 1 次提交

tracing: Add snapshot at end of kernel boot up · 380af29b

由 Steven Rostedt (Google) 提交于 3月 10, 2022

Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the
end of boot up just before switching over to user space (it happens during
the kernel freeing of init memory).

This is useful when there's interesting data that can be collected from
kernel start up, but gets overridden by user space start up code. With
this option, the ring buffer content from the boot up traces gets saved in
the snapshot at the end of boot up. This trace can be read from:

 /sys/kernel/tracing/snapshot
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>

380af29b

28 2月, 2022 1 次提交

Documentation: admin-guide: Add Documentation for undocumented dell_smm_hwmon parameters · 99fdc587

由 Armin Wolf 提交于 1月 09, 2022

Add documentation for fan_mult and fan_max.
Signed-off-by: NArmin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20220109214248.61759-3-W_Armin@gmx.deSigned-off-by: NGuenter Roeck <linux@roeck-us.net>

99fdc587

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功