提交 · 2951067089a3137bc8dce9fdd96f6e6eb3ed3c4d · openanolis / cloud-kernel

15 5月, 2019 1 次提交

x86/speculation/mds: Add mitigation control for MDS · 29510670

由 Thomas Gleixner 提交于 2月 18, 2019

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

  mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

29510670

02 5月, 2019 1 次提交

powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg · 5cb299c8

由 Diana Craciun 提交于 12月 12, 2018

commit e59f5bd759b7dee57593c5b6c0441609bda5d530 upstream.
Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5cb299c8

10 1月, 2019 1 次提交

x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off · 86ba6f66

由 Michal Hocko 提交于 11月 13, 2018

commit 5b5e4d623ec8a34689df98e42d038a3b594d2ff9 upstream.

Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
the system is deemed affected by L1TF vulnerability. Even though the limit
is quite high for most deployments it seems to be too restrictive for
deployments which are willing to live with the mitigation disabled.

We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
clearly out of the limit.

Drop the swap restriction when l1tf=off is specified. It also doesn't make
much sense to warn about too much memory for the l1tf mitigation when it is
forcefully disabled by the administrator.

[ tglx: Folded the documentation delta change ]

Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <linux-mm@kvack.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86ba6f66

06 12月, 2018 4 次提交

x86/speculation: Provide IBPB always command line options · 9f3baace

由 Thomas Gleixner 提交于 11月 25, 2018

commit 55a974021ec952ee460dc31ca08722158639de72 upstream

Provide the possibility to enable IBPB always in combination with 'prctl'
and 'seccomp'.

Add the extra command line options and rework the IBPB selection to
evaluate the command instead of the mode selected by the STIPB switch case.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9f3baace

x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354

由 Thomas Gleixner 提交于 11月 25, 2018

commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream

If 'prctl' mode of user space protection from spectre v2 is selected
on the kernel command-line, STIBP and IBPB are applied on tasks which
restrict their indirect branch speculation via prctl.

SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
makes sense to prevent spectre v2 user space to user space attacks as
well.

The Intel mitigation guide documents how STIPB works:
    
   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
   prevents the predicted targets of indirect branches on any logical
   processor of that core from being controlled by software that executes
   (or executed previously) on another logical processor of the same core.

Ergo setting STIBP protects the task itself from being attacked from a task
running on a different hyper-thread and protects the tasks running on
different hyper-threads from being attacked.

While the document suggests that the branch predictors are shielded between
the logical processors, the observed performance regressions suggest that
STIBP simply disables the branch predictor more or less completely. Of
course the document wording is vague, but the fact that there is also no
requirement for issuing IBPB when STIBP is used points clearly in that
direction. The kernel still issues IBPB even when STIBP is used until Intel
clarifies the whole mechanism.

IBPB is issued when the task switches out, so malicious sandbox code cannot
mistrain the branch predictor for the next user space task on the same
logical processor.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d1ec2354

x86/speculation: Enable prctl mode for spectre_v2_user · 7b62ef14

由 Thomas Gleixner 提交于 11月 25, 2018

commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream

Now that all prerequisites are in place:

 - Add the prctl command line option

 - Default the 'auto' mode to 'prctl'

 - When SMT state changes, update the static key which controls the
   conditional STIBP evaluation on context switch.

 - At init update the static key which controls the conditional IBPB
   evaluation on context switch.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7b62ef14

x86/speculation: Add command line control for indirect branch speculation · 71187543

由 Thomas Gleixner 提交于 11月 25, 2018

commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream

Add command line control for user space indirect branch speculation
mitigations. The new option is: spectre_v2_user=

The initial options are:

    -  on:   Unconditionally enabled
    - off:   Unconditionally disabled
    -auto:   Kernel selects mitigation (default off for now)

When the spectre_v2= command line argument is either 'on' or 'off' this
implies that the application to application control follows that state even
if a contradicting spectre_v2_user= argument is supplied.
Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

71187543

27 11月, 2018 2 次提交

USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub · ed8acd13

由 Kai-Heng Feng 提交于 10月 19, 2018

commit 781f0766cc41a9dd2e5d118ef4b1d5d89430257b upstream.

Devices connected under Terminus Technology Inc. Hub (1a40:0101) may
fail to work after the system resumes from suspend:
[  206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd
[  206.143691] usb 3-2.4: device descriptor read/64, error -32
[  206.351671] usb 3-2.4: device descriptor read/64, error -32

Info for this hub:
T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480 MxCh= 4
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1a40 ProdID=0101 Rev=01.11
S:  Product=USB 2.0 Hub
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub

Some expirements indicate that the USB devices connected to the hub are
innocent, it's the hub itself is to blame. The hub needs extra delay
time after it resets its port.

Hence wait for extra delay, if the device is connected to this quirky
hub.
Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ed8acd13

x86/earlyprintk: Add a force option for pciserial device · 9f0e46bf

由 Feng Tang 提交于 10月 03, 2018

[ Upstream commit d2266bbfa9e3e32e3b642965088ca461bd24a94f ]

The "pciserial" earlyprintk variant helps much on many modern x86
platforms, but unfortunately there are still some platforms with PCI
UART devices which have the wrong PCI class code. In that case, the
current class code check does not allow for them to be used for logging.

Add a sub-option "force" which overrides the class code check and thus
the use of such device can be enforced.

 [ bp: massage formulations. ]
Suggested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thymo van Beers <thymovanbeers@gmail.com>
Cc: alan@linux.intel.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>

9f0e46bf

14 9月, 2018 1 次提交

xen/balloon: add runtime control for scrubbing ballooned out pages · 197ecb38

由 Marek Marczykowski-Górecki 提交于 9月 07, 2018

Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.

Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.
Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

197ecb38

02 9月, 2018 1 次提交

random: make CPU trust a boot parameter · 9b254366

由 Kees Cook 提交于 8月 27, 2018

Instead of forcing a distro or other system builder to choose
at build time whether the CPU is trusted for CRNG seeding via
CONFIG_RANDOM_TRUST_CPU, provide a boot-time parameter for end users to
control the choice. The CONFIG will set the default state instead.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9b254366

23 8月, 2018 1 次提交

mm: clarify CONFIG_PAGE_POISONING and usage · 8c9a134c

由 Kees Cook 提交于 8月 21, 2018

The Kconfig text for CONFIG_PAGE_POISONING doesn't mention that it has to
be enabled explicitly. This updates the documentation for that and adds a
note about CONFIG_PAGE_POISONING to the "page_poison" command line docs.
While here, change description of CONFIG_PAGE_POISONING_ZERO too, as it's
not "random" data, but rather the fixed debugging value that would be used
when not zeroing. Additionally removes a stray "bool" in the Kconfig.

Link: http://lkml.kernel.org/r/20180725223832.GA43733@beastSigned-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c9a134c

10 8月, 2018 3 次提交

PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support · aaca43fd

由 Logan Gunthorpe 提交于 7月 30, 2018

To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges.  The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created.  Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.

The new parameter takes a list of devices separated by a semicolon.  Each
device specified will have its ACS redirect bits disabled.  This is
similar to the existing 'resource_alignment' parameter.

The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted.  The bits are set after the kernel
(optionally) enables the ACS bits itself.  It is also done regardless of
whether the kernel or platform firmware sets the bits.

If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>

aaca43fd

PCI: Allow specifying devices using a base bus and path of devfns · 45db3370

由 Logan Gunthorpe 提交于 7月 30, 2018

When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses".  When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.

Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.

Thus, we add an option to specify devices in the following format:

  [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*

The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'.  When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>

45db3370

PCI: Make specifying PCI devices in kernel parameters reusable · 07d8d7e5

由 Logan Gunthorpe 提交于 7月 30, 2018

Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.

While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).

Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.

This patch should have no functional alterations.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>

07d8d7e5

07 8月, 2018 1 次提交

Documentation: Add nospectre_v1 parameter · 26cb1f36

由 Diana Craciun 提交于 7月 28, 2018

Currently only supported on powerpc.
Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

26cb1f36

27 7月, 2018 1 次提交

iommu: Add config option to set passthrough as default · 58d11317

由 Olof Johansson 提交于 7月 20, 2018

This allows the default behavior to be controlled by a kernel config
option instead of changing the commandline for the kernel to include
"iommu.passthrough=on" or "iommu=pt" on machines where this is desired.

Likewise, for machines where this config option is enabled, it can be
disabled at boot time with "iommu.passthrough=off" or "iommu=nopt".

Also corrected iommu=pt documentation for IA-64, since it has no code that
parses iommu= at all.
Signed-off-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

58d11317

20 7月, 2018 1 次提交

x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e

由 Pavel Tatashin 提交于 7月 19, 2018

Currently, the notsc kernel parameter disables the use of the TSC by
sched_clock(). However, this parameter does not prevent the kernel from
accessing tsc in other places.

The only rationale to boot with notsc is to avoid timing discrepancies on
multi-socket systems where TSC are not properly synchronized, and thus
exclude TSC from being used for time keeping. But that prevents using TSC
as sched_clock() as well, which is not necessary as the core sched_clock()
implementation can handle non synchronized TSC based sched clocks just
fine.

However, there is another method to solve the above problem: booting with
tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
just excludes it from timekeeping.

So there is no real reason to keep notsc, but for compatibility reasons the
parameter has to stay. Make it behave like 'tsc=unstable' instead.

[ tglx: Massaged changelog ]
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com

fe9af81e

18 7月, 2018 1 次提交

vsprintf: Add command line option debug_boot_weak_hash · 3672476e

由 Tobin C. Harding 提交于 6月 22, 2018

Currently printing [hashed] pointers requires enough entropy to be
available.  Early in the boot sequence this may not be the case
resulting in a dummy string '(____ptrval____)' being printed.  This
makes debugging the early boot sequence difficult.  We can relax the
requirement to use cryptographically secure hashing during debugging.
This enables debugging while keeping development/production kernel
behaviour the same.

If new command line option debug_boot_weak_hash is enabled use
cryptographically insecure hashing and hash pointer value immediately.
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3672476e

13 7月, 2018 2 次提交

x86/bugs, kvm: Introduce boot-time control of L1TF mitigations · d90a7a0e

由 Jiri Kosina 提交于 7月 13, 2018

Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
	Provides all available mitigations for the L1TF vulnerability. Disables
	SMT and enables all mitigations in the hypervisors. SMT control via
	/sys/devices/system/cpu/smt/control is still possible after boot.
	Hypervisors will issue a warning when the first VM is started in
	a potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  full,force
	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
	command line option. sysfs control of SMT and the hypervisor flush
	control is disabled.

  flush
	Leaves SMT enabled and enables the conditional hypervisor mitigation.
	Hypervisors will issue a warning when the first VM is started in a
	potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  flush,nosmt
	Disables SMT and enables the conditional hypervisor mitigation. SMT
	control via /sys/devices/system/cpu/smt/control is still possible
	after boot. If SMT is reenabled or flushing disabled at runtime
	hypervisors will issue a warning.

  flush,nowarn
	Same as 'flush', but hypervisors will not warn when
	a VM is started in a potentially insecure configuration.

  off
	Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
    			  possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
			  SMT has been runtime enabled or L1D flushing
			  has been run-time enabled
			  
  - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.
  
  - 'l1tf=off'		: L1D flushes are not performed and no warnings
			  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJiri Kosina <jkosina@suse.cz>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de

d90a7a0e

rcutorture: Change units of onoff_interval to jiffies · 028be12b

由 Paul E. McKenney 提交于 5月 08, 2018

Some RCU bugs have been sensitive to the frequency of CPU-hotplug
operations, which have been gradually increased over time. But this
frequency is now at the one-second lower limit that can be specified using
the rcutorture.onoff_interval kernel parameter. This commit therefore
changes the units of rcutorture.onoff_interval from seconds to jiffies,
and also sets the value specified for this kernel parameter in the TREE03
rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

028be12b

11 7月, 2018 2 次提交

Documentation: Add powerpc options for spec_store_bypass_disable · 6b4c1360

由 Michael Ellerman 提交于 7月 10, 2018

Document the support for spec_store_bypass_disable that was added for
powerpc in commit a048a07d ("powerpc/64s: Add support for a store
forwarding barrier at kernel entry/exit").
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

6b4c1360

docs: kernel-parameters.txt: document xhci-hcd.quirks parameter · 819d731f

由 Laurentiu Tudor 提交于 7月 05, 2018

This parameter introduced several years ago in the XHCI host controller
driver was somehow left undocumented. Add a few lines in the kernel
parameters text.
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

819d731f

10 7月, 2018 1 次提交

driver core: allow stopping deferred probe after init · 25b4e70d

由 Rob Herring 提交于 7月 09, 2018

Deferred probe will currently wait forever on dependent devices to probe,
but sometimes a driver will never exist. It's also not always critical for
a driver to exist. Platforms can rely on default configuration from the
bootloader or reset defaults for things such as pinctrl and power domains.
This is often the case with initial platform support until various drivers
get enabled. There's at least 2 scenarios where deferred probe can render
a platform broken. Both involve using a DT which has more devices and
dependencies than the kernel supports. The 1st case is a driver may be
disabled in the kernel config. The 2nd case is the kernel version may
simply not have the dependent driver. This can happen if using a newer DT
(provided by firmware perhaps) with a stable kernel version. Deferred
probe issues can be difficult to debug especially if the console has
dependencies or userspace fails to boot to a shell.

There are also cases like IOMMUs where only built-in drivers are
supported, so deferring probe after initcalls is not needed. The IOMMU
subsystem implemented its own mechanism to handle this using OF_DECLARE
linker sections.

This commit adds makes ending deferred probe conditional on initcalls
being completed or a debug timeout. Subsystems or drivers may opt-in by
calling driver_deferred_probe_check_init_done() instead of
unconditionally returning -EPROBE_DEFER. They may use additional
information from DT or kernel's config to decide whether to continue to
defer probe or not.

The timeout mechanism is intended for debug purposes and WARNs loudly.
The remaining deferred probe pending list will also be dumped after the
timeout. Not that this timeout won't work for the console which needs
to be enabled before userspace starts. However, if the console's
dependencies are resolved, then the kernel log will be printed (as
opposed to no output).

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

25b4e70d

06 7月, 2018 1 次提交

docs: kernel-parameters.txt: document xhci-hcd.quirks parameter · c0addc9a

由 Laurentiu Tudor 提交于 7月 05, 2018

This parameter introduced several years ago in the XHCI host controller
driver was somehow left undocumented. Add a few lines in the kernel
parameters text.
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c0addc9a

05 7月, 2018 2 次提交

x86/KVM/VMX: Add module argument for L1TF mitigation · a399477e

由 Konrad Rzeszutek Wilk 提交于 7月 02, 2018

Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:

 - "always" 	L1D cache flush on every VMENTER.
 - "cond"	Conditional L1D cache flush, explained below
 - "never"	Disable the L1D cache flush mitigation

"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.

[ tglx: Split out from a larger patch ]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a399477e

x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present · 26acfb66

由 Konrad Rzeszutek Wilk 提交于 6月 20, 2018

If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.

Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.

Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4a ("cpu/hotplug: Provide knobs to control SMT").

Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

26acfb66

04 7月, 2018 1 次提交

usercopy: Allow boot cmdline disabling of hardening · b5cb15d9

由 Chris von Recklinghausen 提交于 7月 03, 2018

Enabling HARDENED_USERCOPY may cause measurable regressions in networking
performance: up to 8% under UDP flood.

I ran a small packet UDP flood using pktgen vs. a host b2b connected. On
the receiver side the UDP packets are processed by a simple user space
process that just reads and drops them:

https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

Not very useful from a functional PoV, but it helps to pin-point
bottlenecks in the networking stack.

When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
regression in the receive tput, compared to the same kernel without this
option enabled.

With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).

The call-chain is:

__GI___libc_recvfrom
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_recvfrom
__sys_recvfrom
inet_recvmsg
udp_recvmsg
__check_object_size

udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
calls check_copy_size() (again, inlined).

A generic distro may want to enable HARDENED_USERCOPY in their default
kernel config, but at the same time, such distro may want to be able to
avoid the performance penalties in with the default configuration and
disable the stricter check on a per-boot basis.

This change adds a boot parameter that conditionally disables
HARDENED_USERCOPY via "hardened_usercopy=off".
Signed-off-by: NChris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: NKees Cook <keescook@chromium.org>

b5cb15d9

02 7月, 2018 1 次提交

Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3

由 Thomas Gleixner 提交于 6月 29, 2018

Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.

The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:

    Because the logical processors within a physical package are tightly
    coupled with respect to shared hardware resources, both logical
    processors are notified of machine check errors that occur within a
    given physical processor. If machine-check exceptions are enabled when
    a fatal error is reported, all the logical processors within a physical
    package are dispatched to the machine-check exception handler. If
    machine-check exceptions are disabled, the logical processors enter the
    shutdown state and assert the IERR# signal. When enabling machine-check
    exceptions, the MCE flag in control register CR4 should be set for each
    logical processor.

Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.

This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:

MCE is enabled on the boot CPU:

[    0.244017] mce: CPU supports 22 MCE banks

The corresponding sibling #72 boots:

[    1.008005] .... node  #0, CPUs:    #72

That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.

It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.

Adjust the nosmt kernel parameter documentation as well.

Reverts: 2207def7 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NTony Luck <tony.luck@intel.com>

506a66f3

30 6月, 2018 1 次提交

PCI: Make early dump functionality generic · 11eb0e0e

由 Sinan Kaya 提交于 6月 04, 2018

Move early dump functionality into common code so that it is available for
all architectures.  No need to carry arch-specific reads around as the read
hooks are already initialized by the time pci_setup_device() is getting
called during scan.
Tested-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>

11eb0e0e

21 6月, 2018 1 次提交

cpu/hotplug: Provide knobs to control SMT · 05736e4a

由 Thomas Gleixner 提交于 5月 29, 2018

Provide a command line and a sysfs knob to control SMT.

The command line options are:

 'nosmt':	Enumerate secondary threads, but do not online them
 		
 'nosmt=force': Ignore secondary threads completely during enumeration
 		via MP table and ACPI/MADT.

The sysfs control file has the following states (read/write):

 'on':		 SMT is enabled. Secondary threads can be freely onlined
 'off':		 SMT is disabled. Secondary threads, even if enumerated
 		 cannot be onlined
 'forceoff':	 SMT is permanentely disabled. Writes to the control
 		 file are rejected.
 'notsupported': SMT is not supported by the CPU

The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.

The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.

When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.

When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.

When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.

When the control status is 'notsupported' then writes to the control file
are rejected.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NIngo Molnar <mingo@kernel.org>

05736e4a

16 6月, 2018 2 次提交

kernel-parameters.txt: fix pointers to sound parameters · 1ca2c806

由 Mauro Carvalho Chehab 提交于 6月 14, 2018

The alsa parameters file was renamed to alsa-configuration.rst.

With regards to OSS, it got retired as a hole by  at changeset
727dede0 ("sound: Retire OSS"). So, it doesn't make sense
to keep mentioning it at kernel-parameters.txt.

Fixes: 727dede0 ("sound: Retire OSS")
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

1ca2c806

docs: Fix some broken references · 5fb94e9c

由 Mauro Carvalho Chehab 提交于 5月 08, 2018

As we move stuff around, some doc references are broken. Fix some of
them via this script:
	./scripts/documentation-file-ref-check --fix

Manually checked if the produced result is valid, removing a few
false-positives.
Acked-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NStephen Boyd <sboyd@kernel.org>
Acked-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

5fb94e9c

01 6月, 2018 1 次提交

arm64: Add 'ssbd' command-line option · a43ae4df

由 Marc Zyngier 提交于 5月 29, 2018

On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.

In any case, default to the mitigation being enabled.
Reviewed-by: NJulien Grall <julien.grall@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

a43ae4df

29 5月, 2018 1 次提交

Documentation: document hung_task_panic kernel parameter · a49d9c0a

由 Omar Sandoval 提交于 5月 21, 2018

This parameter has been around since commit e162b39a ("softlockup:
decouple hung tasks check from softlockup detection") in 2009 but was
never documented.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

a49d9c0a

28 5月, 2018 1 次提交

x86/pci-dma: remove the experimental forcesac boot option · 06e9552f

由 Christoph Hellwig 提交于 4月 27, 2018

Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying
the huge overhead of an IOMMU is rather pointless, and this seriously
gets in the way of dma mapping work.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

06e9552f

21 5月, 2018 1 次提交

docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge · 45c9a74f

由 Mike Rapoport 提交于 5月 14, 2018

Now that the administrative information for transparent huge pages is
nicely separated, move it to its own page under the admin guide.
Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

45c9a74f

19 5月, 2018 1 次提交

x86/mm: Introduce the 'no5lvl' kernel parameter · 372fddf7

由 Kirill A. Shutemov 提交于 5月 18, 2018

This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.

The option may be useful to work around regressions related to 5-level
paging.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

372fddf7

14 5月, 2018 1 次提交

tty: serial: qcom_geni_serial: Add early console support · 43f1831b

由 Karthikeyan Ramasubramanian 提交于 5月 03, 2018

Add early console support in Qualcomm Technologies Inc., GENI based
UART controller.
Signed-off-by: NKarthikeyan Ramasubramanian <kramasub@codeaurora.org>
Signed-off-by: NGirish Mahadevan <girishm@codeaurora.org>
Reviewed-by: NStephen Boyd <swboyd@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

43f1831b

11 5月, 2018 1 次提交

PCI: Add "pci=noats" boot parameter · cef74409

由 Gil Kupfer 提交于 5月 10, 2018

Adds a "pci=noats" boot parameter.  When supplied, all ATS related
functions fail immediately and the IOMMU is configured to not use
device-IOTLB.

Any function that checks for ATS capabilities directly against the devices
should also check this flag.  Currently, such functions exist only in IOMMU
drivers, and they are covered by this patch.

The motivation behind this patch is the existence of malicious devices.
Lots of research has been done about how to use the IOMMU as protection
from such devices.  When ATS is supported, any I/O device can access any
physical address by faking device-IOTLB entries.  Adding the ability to
ignore these entries lets sysadmins enhance system security.
Signed-off-by: NGil Kupfer <gilkup@cs.technion.ac.il>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NJoerg Roedel <jroedel@suse.de>

cef74409

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功