提交 · 93285c01977729a2e046e065e4b99791b966130c · openeuler / Kernel

30 5月, 2019 1 次提交

doc: kernel-parameters.txt: fix documentation of nmi_watchdog parameter · 93285c01

由 Zhenzhong Duan 提交于 5月 21, 2019

The default behavior of hardlockup depends on the config of
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.

Fix the description of nmi_watchdog to make it clear.
Suggested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

93285c01

23 5月, 2019 1 次提交

docs: fix numaperf.rst and add it to the doc tree · 8867f610

由 Jonathan Corbet 提交于 5月 22, 2019

Commit 13bac55e ("doc/mm: New documentation for memory performance")
added numaperf.rst, but did not add it to the TOC tree. There was also an
incorrectly marked literal block leading to this warning sequence:

numaperf.rst:24: WARNING: Unexpected indentation.
numaperf.rst:24: WARNING: Inline substitution_reference start-string without end-string.
numaperf.rst:25: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix the block and add the file to the document tree.

Fixes: 13bac55e ("doc/mm: New documentation for memory performance")
Cc: Keith Busch <keith.busch@intel.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

8867f610

19 5月, 2019 1 次提交

panic: add an option to replay all the printk message in buffer · de6da1e8

由 Feng Tang 提交于 5月 17, 2019

Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
  Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
  Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de6da1e8

15 5月, 2019 3 次提交

ipc: allow boot time extension of IPCMNI from 32k to 16M · 5ac893b8

由 Waiman Long 提交于 5月 14, 2019

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers.  To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC
identifiers returned by functions like shmget(2).  An application that
depends on such pattern may not work properly.  So it should only be
used if the users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128.  So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ac893b8

panic/reboot: allow specifying reboot_mode for panic only · b287a25a

由 Aaro Koskinen 提交于 5月 14, 2019

Allow specifying reboot_mode for panic only. This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.

Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fiSigned-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b287a25a

mm: shuffle initial free memory to improve memory-side-cache utilization · e900a918

由 Dan Williams 提交于 5月 14, 2019

Patch series "mm: Randomize free memory", v10.

This patch (of 3):

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache.  Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms.  In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM.  Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

    It's been a problem in the HPC space:
    http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

    A kernel module called zonesort is available to try to help:
    https://software.intel.com/en-us/articles/xeon-phi-software

    and this abandoned patch series proposed that for the kernel:
    https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

    Dan's patch series doesn't attempt to ensure buffers won't conflict, but
    also reduces the chance that the buffers will. This will make performance
    more consistent, albeit slower than "optimal" (which is near impossible
    to attain in a general-purpose kernel).  That's better than forcing
    users to deploy remedies like:
        "To eliminate this gradual degradation, we have added a Stream
         measurement to the Node Health Check that follows each job;
         nodes are rebooted whenever their measured memory bandwidth
         falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03
("x86/numa_emulation: Introduce uniform split capability").  With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes.  A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable.  While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts.  Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
   3X speedup in a contrived case that tries to force cache conflicts.
   The contrived cased used the numa_emulation capability to force an
   instance of the benchmark to be run in two of the near-memory sized
   numa nodes.  If both instances were placed on the same emulated they
   would fit and cause zero conflicts.  While on separate emulated nodes
   without randomization they underutilized the cache and conflicted
   unnecessarily due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
   size that exceeded cache size by 3X.  The cache conflict rate was 8%
   for the first run and degraded to 21% after page allocator aging.  With
   randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
   cache-conflict rates, but the overall throughput dropped by 7% with
   randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
   enabled on platforms without a memory-side-cache and saw a mix of some
   improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads.  For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected.  Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization.  Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects.  The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages.  Pages are freed in physical address order when first
onlined.

Quoting Kees:
    "While we already have a base-address randomization
     (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
     memory layouts would certainly be using the predictability of
     allocation ordering (i.e. for attacks where the base address isn't
     important: only the relative positions between allocated memory).
     This is common in lots of heap-style attacks. They try to gain
     control over ordering by spraying allocations, etc.

     I'd really like to see this because it gives us something similar
     to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.

Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time.  Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).

The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e.  10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM.  The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions.  It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages.  At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent.  As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309

[dan.j.williams@intel.com: fix shuffle enable]
  Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
  Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e900a918

08 5月, 2019 2 次提交

Documentation: Correct the possible MDS sysfs values · ea01668f

由 Tyler Hicks 提交于 5月 06, 2019

Adjust the last two rows in the table that display possible values when
MDS mitigation is enabled. They both were slightly innacurate.

In addition, convert the table of possible values and their descriptions
to a list-table. The simple table format uses the top border of equals
signs to determine cell width which resulted in the first column being
far too wide in comparison to the second column that contained the
majority of the text.
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ea01668f

x86/mds: Add MDSUM variant to the MDS documentation · e672f8bf

由 speck for Pawan Gupta 提交于 5月 06, 2019

Updated the documentation for a new CVE-2019-11091 Microarchitectural Data
Sampling Uncacheable Memory (MDSUM) which is a variant of
Microarchitectural Data Sampling (MDS). MDS is a family of side channel
attacks on internal buffers in Intel CPUs.

MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
memory that takes a fault or assist can leave data in a microarchitectural
structure that may later be observed using one of the same methods used by
MSBDS, MFBDS or MLPDS. There are no new code changes expected for MDSUM.
The existing mitigation for MDS applies to MDSUM as well.
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Reviewed-by: NJon Masters <jcm@redhat.com>

e672f8bf

01 5月, 2019 2 次提交

Documentation: Add ARM64 to kernel-parameters.rst · 4ad499c9

由 Josh Poimboeuf 提交于 4月 12, 2019

Add ARM64 to the legend of architectures.  It's already used in several
places in kernel-parameters.txt.
Suggested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

4ad499c9

arm64/speculation: Support 'mitigations=' cmdline option · a111b7c0

由 Josh Poimboeuf 提交于 4月 12, 2019

Configure arm64 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
v2, and Speculative Store Bypass.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
[will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a111b7c0

29 4月, 2019 2 次提交

s390/pci: add parameter to disable usage of MIO instructions · 56271303

由 Sebastian Ott 提交于 4月 18, 2019

Allow users to disable usage of MIO instructions by specifying pci=nomio
at the kernel command line.
Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

56271303

s390/pci: add parameter to force floating irqs · fbfe07d4

由 Sebastian Ott 提交于 2月 26, 2019

Provide a kernel parameter to force the usage of floating interrupts.
Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fbfe07d4

26 4月, 2019 7 次提交

arm64: Provide a command line to disable spectre_v2 mitigation · e5ce5e72

由 Jeremy Linton 提交于 4月 15, 2019

There are various reasons, such as benchmarking, to disable spectrev2
mitigation on a machine. Provide a command-line option to do so.
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NStefan Wahren <stefan.wahren@i2se.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e5ce5e72

Documentation: ACPI: move ssdt-overlays.txt to admin-guide/acpi and convert to reST · 7fe19072

由 Changbin Du 提交于 4月 25, 2019

This converts the plain text documentation to reStructuredText format
and adds it to Sphinx TOC tree.

No essential content change.
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7fe19072

Documentation: ACPI: move cppc_sysfs.txt to admin-guide/acpi and convert to reST · 3e57460f

由 Changbin Du 提交于 4月 25, 2019

This converts the plain text documentation to reStructuredText format
and adds it to Sphinx TOC tree.

No essential content change.
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3e57460f

Documentation: ACPI: move dsdt-override.txt to admin-guide/acpi and convert to reST · 34bf473b

由 Changbin Du 提交于 4月 25, 2019

This converts the plain text documentation to reStructuredText format
and adds it to Sphinx TOC tree.

No essential content change.
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

34bf473b

Documentation: ACPI: move initrd_table_override.txt to admin-guide/acpi and convert to reST · 59bcdccc

由 Changbin Du 提交于 4月 25, 2019

This converts the plain text documentation to reStructuredText format
and adds it to Sphinx TOC tree.

No essential content change.
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

59bcdccc

Documentation: add Linux ACPI to Sphinx TOC tree · 680e6ffa

由 Changbin Du 提交于 4月 25, 2019

Add below index.rst files for ACPI subsystem. More docs will be added later.
  o admin-guide/acpi/index.rst
  o driver-api/acpi/index.rst
  o firmware-guide/index.rst
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

680e6ffa

docs: ext4.rst: document case-insensitive directories · 0a790fe4

由 Gabriel Krisman Bertazi 提交于 4月 25, 2019

Introduces the case-insensitive features on ext4 for system
administrators.  Explain the minimum of design decisions that are
important for sysadmins wanting to enable this feature.
Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0a790fe4

23 4月, 2019 1 次提交

x86/xen: Add "xen_timer_slop" command line option · 2ec16bc0

由 Ryan Thibodeaux 提交于 3月 22, 2019

Add a new command-line option "xen_timer_slop=<INT>" that sets the
minimum delta of virtual Xen timers. This commit does not change the
default timer slop value for virtual Xen timers.

Lowering the timer slop value should improve the accuracy of virtual
timers (e.g., better process dispatch latency), but it will likely
increase the number of virtual timer interrupts (relative to the
original slop setting).

The original timer slop value has not changed since the introduction
of the Xen-aware Linux kernel code. This commit provides users an
opportunity to tune timer performance given the refinements to
hardware and the Xen event channel processing. It also mirrors
a feature in the Xen hypervisor - the "timer_slop" Xen command line
option.

[boris: updated comment describing TIMER_SLOP]
Signed-off-by: NRyan Thibodeaux <ryan.thibodeaux@starlab.io>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

2ec16bc0

22 4月, 2019 1 次提交

x86/kdump: Fall back to reserve high crashkernel memory · b9ac3849

由 Dave Young 提交于 4月 22, 2019

crashkernel=xM tries to reserve memory for the crash kernel under 4G,
which is enough, usually. But this could fail sometimes, for example
when one tries to reserve a big chunk like 2G, for example.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range. Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, and
this is not always necessary for all machines.

Typically, crashkernel=128M usually works with low reservation under 4G,
so keep <4G as default.

 [ bp: Massage. ]
Signed-off-by: NDave Young <dyoung@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NBaoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: piliu@redhat.com
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thymo van Beers <thymovanbeers@gmail.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Zhimin Gu <kookoo.gu@intel.com>
Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com

b9ac3849

21 4月, 2019 2 次提交

powerpc: Add a framework for Kernel Userspace Access Protection · de78a9c4

由 Christophe Leroy 提交于 4月 18, 2019

This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have the possibility to provide their own
implementation by providing setup_kuap() and
allow/prevent_user_access().

Some platforms will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.

mpe: Rename to allow/prevent rather than unlock/lock, and add
read/write wrappers. Drop the 32-bit code for now until we have an
implementation for it. Add kuap to pt_regs for 64-bit as well as
32-bit. Don't split strings, use pr_crit_ratelimited().
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de78a9c4

powerpc: Add skeleton for Kernel Userspace Execution Prevention · 0fb1c25a

由 Christophe Leroy 提交于 4月 18, 2019

This patch adds a skeleton for Kernel Userspace Execution Prevention.

Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
and provide setup_kuep() function.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
[mpe: Don't split strings, use pr_crit_ratelimited()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0fb1c25a

20 4月, 2019 1 次提交

cgroup: document cgroup v2 freezer interface · afe471ea

由 Roman Gushchin 提交于 4月 19, 2019

Describe cgroup v2 freezer interface in the cgroup v2 admin guide.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: linux-doc@vger.kernel.org
Cc: kernel-team@fb.com

afe471ea

18 4月, 2019 5 次提交

x86/speculation/mds: Add 'mitigations=' support for MDS · 5c14068f

由 Josh Poimboeuf 提交于 4月 17, 2019

Add MDS to the new 'mitigations=' cmdline option.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5c14068f

s390/speculation: Support 'mitigations=' cmdline option · 0336e04a

由 Josh Poimboeuf 提交于 4月 12, 2019

Configure s390 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Spectre v1 and
Spectre v2.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/e4a161805458a5ec88812aac0307ae3908a030fc.1555085500.git.jpoimboe@redhat.com

0336e04a

powerpc/speculation: Support 'mitigations=' cmdline option · 782e69ef

由 Josh Poimboeuf 提交于 4月 12, 2019

Configure powerpc CPU runtime speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
v1, Spectre v2, and Speculative Store Bypass.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/245a606e1a42a558a310220312d9b6adb9159df6.1555085500.git.jpoimboe@redhat.com

782e69ef

x86/speculation: Support 'mitigations=' cmdline option · d68be4c4

由 Josh Poimboeuf 提交于 4月 12, 2019

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option.  This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com

d68be4c4

cpu/speculation: Add 'mitigations=' cmdline option · 98af8452

由 Josh Poimboeuf 提交于 4月 12, 2019

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com

98af8452

11 4月, 2019 1 次提交
- P
  doc/kernel-parameters.txt: Deprecate ima_appraise_tcb · 41475a3e
  由 Petr Vorel 提交于 4月 04, 2019
```
Signed-off-by: NPetr Vorel <pvorel@suse.cz>
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
```
  41475a3e
08 4月, 2019 6 次提交

admin-guide: pm: intel_epb: Add SPDX license tag and copyright notice · 7973b799

由 Rafael J. Wysocki 提交于 4月 04, 2019

Add an SPDX license tag and a copyright notice to the intel_epb.rst
file under Documentation/admin-quide/pm.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

7973b799

Documentation: PM: Unify copyright notices · fc1860d6

由 Rafael J. Wysocki 提交于 4月 04, 2019

Unify copyright notices in the .rst files under
Documentation/driver-api/pm and Documentation/admin-quide/pm.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

fc1860d6

Documentation: PM: Add SPDX license tags to multiple files · fc7db767

由 Rafael J. Wysocki 提交于 4月 04, 2019

Add SPDX license tags to .rst files under Documentation/driver-api/pm
and Documentation/admin-quide/pm.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

fc7db767

cpufreq: intel_pstate: Documentation: Add references sections · 1120b0f9

由 Rafael J. Wysocki 提交于 4月 04, 2019

Add separate refereces sections to the cpufreq.rst and
intel_pstate.rst documents under admin-quide/pm and list the
references to external documentation in there.

Update the ACPI specification URL while at it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

1120b0f9

PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface · b9c273ba

由 Rafael J. Wysocki 提交于 3月 21, 2019

The Performance and Energy Bias Hint (EPB) is expected to be set by
user space through the generic MSR interface, but that interface is
not particularly nice and there are security concerns regarding it,
so it is not always available.

For this reason, add a sysfs interface for reading and updating the
EPB, in the form of a new attribute, energy_perf_bias, located
under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
support the EPB feature.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Acked-by: NBorislav Petkov <bp@suse.de>

b9c273ba

PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handling · 5861381d

由 Rafael J. Wysocki 提交于 3月 21, 2019

The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
problematic, because it may cause changes made by user space to that
MSR (with the help of the x86_energy_perf_policy tool, for example)
to be lost every time a CPU goes offline and then back online as well
as during system-wide power management transitions into sleep states
and back into the working state.

The first problem is that if the current EPB value for a CPU going
online is 0 ('performance'), the kernel will change it to 6 ('normal')
regardless of whether or not this is the first bring-up of that CPU.
That also happens during system-wide resume from sleep states
(including, but not limited to, hibernation).  However, the EPB may
have been adjusted by user space this way and the kernel should not
blindly override that setting.

The second problem is that if the platform firmware resets the EPB
values for any CPUs during system-wide resume from a sleep state,
the kernel will not restore their previous EPB values that may
have been set by user space before the preceding system-wide
suspend transition.  Again, that behavior may at least be confusing
from the user space perspective.

In order to address these issues, rework the handling of
MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU
offline and restored on CPU online as well as (for the boot CPU)
during the syscore stages of system-wide suspend and resume
transitions, respectively.

However, retain the policy by which the EPB is set to 6 ('normal')
on the first bring-up of each CPU if its initial value is 0, based
on the observation that 0 may mean 'not initialized' just as well as
'performance' in that case.

While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into
a separate file and document it in Documentation/admin-guide.

Fixes: abe48b10 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS)
Fixes: b51ef52d (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume)
Reported-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>

5861381d

05 4月, 2019 1 次提交

doc/mm: New documentation for memory performance · 13bac55e

由 Keith Busch 提交于 3月 11, 2019

Platforms may provide system memory where some physical address ranges
perform differently than others, or is cached by the system on the
memory side.

Add documentation describing a high level overview of such systems and the
perforamnce and caching attributes the kernel provides for applications
wishing to query this information.
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NBrice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

13bac55e

03 4月, 2019 1 次提交

x86/speculation/mds: Add mds=full,nosmt cmdline option · d71eb0ce

由 Josh Poimboeuf 提交于 4月 02, 2019

Add the mds=full,nosmt cmdline option.  This is like mds=full, but with
SMT disabled if the CPU is vulnerable.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Acked-by: NJiri Kosina <jkosina@suse.cz>

d71eb0ce

27 3月, 2019 1 次提交

rcu: Allow rcu_nocbs= to specify all CPUs · da8739f2

由 Paul E. McKenney 提交于 3月 05, 2019

Currently, the rcu_nocbs= kernel boot parameter requires that a specific
list of CPUs be specified, and has no way to say "all of them".
As noted by user RavFX in a comment to Phoronix topic 1002538, this
is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL
Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot
parameter to be given the string "all", as in "rcu_nocbs=all" to specify
that all CPUs on the system are to have their RCU callbacks offloaded.

Another approach would be to make cpulist_parse() check for "all", but
there are uses of cpulist_parse() that do other checking, which could
conflict with an "all". This commit therefore focuses on the specific
use of cpulist_parse() in rcu_nocb_setup().

Just a note to other people who would like changes to Linux-kernel RCU:
If you send your requests to me directly, they might get fixed somewhat
faster. RavFX's comment was posted on January 22, 2018 and I first saw
it on March 5, 2019. And the only reason that I found it -at- -all- was
that I was looking for projects using RCU, and my search engine showed
me that Phoronix comment quite by accident. Your choice, though! ;-)
Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>

da8739f2

22 3月, 2019 1 次提交

x86/tsc: Add option to disable tsc clocksource watchdog · 0f0b7e1c

由 Juri Lelli 提交于 3月 07, 2019

Clocksource watchdog has been found responsible for generating latency
spikes (in the 10-20 us range) when woken up to check for TSC stability.

Add an option to disable it at boot.
Signed-off-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: linux-rt-users@vger.kernel.org
Cc: peterz@infradead.org
Cc: bristot@redhat.com
Cc: williams@redhat.com
Link: https://lkml.kernel.org/r/20190307120913.13168-1-juri.lelli@redhat.com

0f0b7e1c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功