1. 15 5月, 2019 1 次提交
    • T
      x86/speculation/mds: Add mitigation control for MDS · 29510670
      Thomas Gleixner 提交于
      commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream
      
      Now that the mitigations are in place, add a command line parameter to
      control the mitigation, a mitigation selector function and a SMT update
      mechanism.
      
      This is the minimal straight forward initial implementation which just
      provides an always on/off mode. The command line parameter is:
      
        mds=[full|off]
      
      This is consistent with the existing mitigations for other speculative
      hardware vulnerabilities.
      
      The idle invocation is dynamically updated according to the SMT state of
      the system similar to the dynamic update of the STIBP mitigation. The idle
      mitigation is limited to CPUs which are only affected by MSBDS and not any
      other variant, because the other variants cannot be mitigated on SMT
      enabled systems.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29510670
  2. 02 5月, 2019 1 次提交
  3. 10 1月, 2019 1 次提交
  4. 06 12月, 2018 4 次提交
    • T
      x86/speculation: Provide IBPB always command line options · 9f3baace
      Thomas Gleixner 提交于
      commit 55a974021ec952ee460dc31ca08722158639de72 upstream
      
      Provide the possibility to enable IBPB always in combination with 'prctl'
      and 'seccomp'.
      
      Add the extra command line options and rework the IBPB selection to
      evaluate the command instead of the mode selected by the STIPB switch case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f3baace
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354
      Thomas Gleixner 提交于
      commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ec2354
    • T
      x86/speculation: Enable prctl mode for spectre_v2_user · 7b62ef14
      Thomas Gleixner 提交于
      commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream
      
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b62ef14
    • T
      x86/speculation: Add command line control for indirect branch speculation · 71187543
      Thomas Gleixner 提交于
      commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream
      
      Add command line control for user space indirect branch speculation
      mitigations. The new option is: spectre_v2_user=
      
      The initial options are:
      
          -  on:   Unconditionally enabled
          - off:   Unconditionally disabled
          -auto:   Kernel selects mitigation (default off for now)
      
      When the spectre_v2= command line argument is either 'on' or 'off' this
      implies that the application to application control follows that state even
      if a contradicting spectre_v2_user= argument is supplied.
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71187543
  5. 27 11月, 2018 2 次提交
    • K
      USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub · ed8acd13
      Kai-Heng Feng 提交于
      commit 781f0766cc41a9dd2e5d118ef4b1d5d89430257b upstream.
      
      Devices connected under Terminus Technology Inc. Hub (1a40:0101) may
      fail to work after the system resumes from suspend:
      [  206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd
      [  206.143691] usb 3-2.4: device descriptor read/64, error -32
      [  206.351671] usb 3-2.4: device descriptor read/64, error -32
      
      Info for this hub:
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480 MxCh= 4
      D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=1a40 ProdID=0101 Rev=01.11
      S:  Product=USB 2.0 Hub
      C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
      
      Some expirements indicate that the USB devices connected to the hub are
      innocent, it's the hub itself is to blame. The hub needs extra delay
      time after it resets its port.
      
      Hence wait for extra delay, if the device is connected to this quirky
      hub.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed8acd13
    • F
      x86/earlyprintk: Add a force option for pciserial device · 9f0e46bf
      Feng Tang 提交于
      [ Upstream commit d2266bbfa9e3e32e3b642965088ca461bd24a94f ]
      
      The "pciserial" earlyprintk variant helps much on many modern x86
      platforms, but unfortunately there are still some platforms with PCI
      UART devices which have the wrong PCI class code. In that case, the
      current class code check does not allow for them to be used for logging.
      
      Add a sub-option "force" which overrides the class code check and thus
      the use of such device can be enforced.
      
       [ bp: massage formulations. ]
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H Peter Anvin <hpa@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thymo van Beers <thymovanbeers@gmail.com>
      Cc: alan@linux.intel.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      9f0e46bf
  6. 14 9月, 2018 1 次提交
    • M
      xen/balloon: add runtime control for scrubbing ballooned out pages · 197ecb38
      Marek Marczykowski-Górecki 提交于
      Scrubbing pages on initial balloon down can take some time, especially
      in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
      started with memory= significantly lower than maxmem=, all the extra
      pages will be scrubbed before returning to Xen. But since most of them
      weren't used at all at that point, Xen needs to populate them first
      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
      this slows down the guest boot by 15-30s with just 1.5GB needed to be
      returned to Xen.
      
      Add runtime parameter to enable/disable it, to allow initially disabling
      scrubbing, then enable it back during boot (for example in initramfs).
      Such usage relies on assumption that a) most pages ballooned out during
      initial boot weren't used at all, and b) even if they were, very few
      secrets are in the guest at that time (before any serious userspace
      kicks in).
      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
      enabled by default), controlling default value for the new runtime
      switch.
      Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      197ecb38
  7. 02 9月, 2018 1 次提交
  8. 23 8月, 2018 1 次提交
  9. 10 8月, 2018 3 次提交
    • L
      PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support · aaca43fd
      Logan Gunthorpe 提交于
      To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
      disable the ACS redirect bits for select PCI bridges.  The bridges must be
      selected before the devices are discovered by the kernel and the IOMMU
      groups created.  Therefore, add a kernel command line parameter to specify
      devices which must have their ACS bits disabled.
      
      The new parameter takes a list of devices separated by a semicolon.  Each
      device specified will have its ACS redirect bits disabled.  This is
      similar to the existing 'resource_alignment' parameter.
      
      The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
      Egress Control bits are disabled, which is sufficient to always allow
      passing P2P traffic uninterrupted.  The bits are set after the kernel
      (optionally) enables the ACS bits itself.  It is also done regardless of
      whether the kernel or platform firmware sets the bits.
      
      If the user tries to disable the ACS redirect for a device without the ACS
      capability, print a warning to dmesg.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: reorder to add the generic code first and move the
      device-specific quirk to subsequent patches]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      aaca43fd
    • L
      PCI: Allow specifying devices using a base bus and path of devfns · 45db3370
      Logan Gunthorpe 提交于
      When specifying PCI devices on the kernel command line using a
      bus/device/function address, bus numbers can change when adding or
      replacing a device, changing motherboard firmware, or applying kernel
      parameters like "pci=assign-buses".  When bus numbers change, it's likely
      the command line tweak will be applied to the wrong device.
      
      Therefore, it is useful to be able to specify devices with a base bus
      number and the path of devfns needed to get to it, similar to the "device
      scope" structure in the Intel VT-d spec, Section 8.3.1.
      
      Thus, we add an option to specify devices in the following format:
      
        [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
      
      The path can be any segment within the PCI hierarchy of any length and
      determined through the use of 'lspci -t'.  When specified this way, it is
      less likely that a renumbered bus will result in a valid device
      specification and the tweak won't be applied to the wrong device.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      45db3370
    • L
      PCI: Make specifying PCI devices in kernel parameters reusable · 07d8d7e5
      Logan Gunthorpe 提交于
      Separate out the code to match a PCI device with a string (typically
      originating from a kernel parameter) from the
      pci_specified_resource_alignment() function into its own helper function.
      
      While we are at it, this change fixes the kernel style of the function
      (fixing a number of long lines and extra parentheses).
      
      Additionally, make the analogous change to the kernel parameter
      documentation: Separate the description of how to specify a PCI device
      into its own section at the head of the "pci=" parameter.
      
      This patch should have no functional alterations.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      07d8d7e5
  10. 07 8月, 2018 1 次提交
  11. 27 7月, 2018 1 次提交
    • O
      iommu: Add config option to set passthrough as default · 58d11317
      Olof Johansson 提交于
      This allows the default behavior to be controlled by a kernel config
      option instead of changing the commandline for the kernel to include
      "iommu.passthrough=on" or "iommu=pt" on machines where this is desired.
      
      Likewise, for machines where this config option is enabled, it can be
      disabled at boot time with "iommu.passthrough=off" or "iommu=nopt".
      
      Also corrected iommu=pt documentation for IA-64, since it has no code that
      parses iommu= at all.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      58d11317
  12. 20 7月, 2018 1 次提交
    • P
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin 提交于
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
  13. 18 7月, 2018 1 次提交
  14. 13 7月, 2018 2 次提交
    • J
      x86/bugs, kvm: Introduce boot-time control of L1TF mitigations · d90a7a0e
      Jiri Kosina 提交于
      Introduce the 'l1tf=' kernel command line option to allow for boot-time
      switching of mitigation that is used on processors affected by L1TF.
      
      The possible values are:
      
        full
      	Provides all available mitigations for the L1TF vulnerability. Disables
      	SMT and enables all mitigations in the hypervisors. SMT control via
      	/sys/devices/system/cpu/smt/control is still possible after boot.
      	Hypervisors will issue a warning when the first VM is started in
      	a potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        full,force
      	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
      	command line option. sysfs control of SMT and the hypervisor flush
      	control is disabled.
      
        flush
      	Leaves SMT enabled and enables the conditional hypervisor mitigation.
      	Hypervisors will issue a warning when the first VM is started in a
      	potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        flush,nosmt
      	Disables SMT and enables the conditional hypervisor mitigation. SMT
      	control via /sys/devices/system/cpu/smt/control is still possible
      	after boot. If SMT is reenabled or flushing disabled at runtime
      	hypervisors will issue a warning.
      
        flush,nowarn
      	Same as 'flush', but hypervisors will not warn when
      	a VM is started in a potentially insecure configuration.
      
        off
      	Disables hypervisor mitigations and doesn't emit any warnings.
      
      Default is 'flush'.
      
      Let KVM adhere to these semantics, which means:
      
        - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
          			  possible.
      
        - 'l1tf=full'
        - 'l1tf-flush'
        - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
      			  SMT has been runtime enabled or L1D flushing
      			  has been run-time enabled
      			  
        - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.
        
        - 'l1tf=off'		: L1D flushes are not performed and no warnings
      			  are emitted.
      
      KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
      module parameter except when lt1f=full,force is set.
      
      This makes KVM's private 'nosmt' option redundant, and as it is a bit
      non-systematic anyway (this is something to control globally, not on
      hypervisor level), remove that option.
      
      Add the missing Documentation entry for the l1tf vulnerability sysfs file
      while at it.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
      d90a7a0e
    • P
      rcutorture: Change units of onoff_interval to jiffies · 028be12b
      Paul E. McKenney 提交于
      Some RCU bugs have been sensitive to the frequency of CPU-hotplug
      operations, which have been gradually increased over time.  But this
      frequency is now at the one-second lower limit that can be specified using
      the rcutorture.onoff_interval kernel parameter.  This commit therefore
      changes the units of rcutorture.onoff_interval from seconds to jiffies,
      and also sets the value specified for this kernel parameter in the TREE03
      rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      028be12b
  15. 11 7月, 2018 2 次提交
  16. 10 7月, 2018 1 次提交
    • R
      driver core: allow stopping deferred probe after init · 25b4e70d
      Rob Herring 提交于
      Deferred probe will currently wait forever on dependent devices to probe,
      but sometimes a driver will never exist. It's also not always critical for
      a driver to exist. Platforms can rely on default configuration from the
      bootloader or reset defaults for things such as pinctrl and power domains.
      This is often the case with initial platform support until various drivers
      get enabled. There's at least 2 scenarios where deferred probe can render
      a platform broken. Both involve using a DT which has more devices and
      dependencies than the kernel supports. The 1st case is a driver may be
      disabled in the kernel config. The 2nd case is the kernel version may
      simply not have the dependent driver. This can happen if using a newer DT
      (provided by firmware perhaps) with a stable kernel version. Deferred
      probe issues can be difficult to debug especially if the console has
      dependencies or userspace fails to boot to a shell.
      
      There are also cases like IOMMUs where only built-in drivers are
      supported, so deferring probe after initcalls is not needed. The IOMMU
      subsystem implemented its own mechanism to handle this using OF_DECLARE
      linker sections.
      
      This commit adds makes ending deferred probe conditional on initcalls
      being completed or a debug timeout. Subsystems or drivers may opt-in by
      calling driver_deferred_probe_check_init_done() instead of
      unconditionally returning -EPROBE_DEFER. They may use additional
      information from DT or kernel's config to decide whether to continue to
      defer probe or not.
      
      The timeout mechanism is intended for debug purposes and WARNs loudly.
      The remaining deferred probe pending list will also be dumped after the
      timeout. Not that this timeout won't work for the console which needs
      to be enabled before userspace starts. However, if the console's
      dependencies are resolved, then the kernel log will be printed (as
      opposed to no output).
      
      Cc: Alexander Graf <agraf@suse.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25b4e70d
  17. 06 7月, 2018 1 次提交
  18. 05 7月, 2018 2 次提交
    • K
      x86/KVM/VMX: Add module argument for L1TF mitigation · a399477e
      Konrad Rzeszutek Wilk 提交于
      Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
      L1 terminal fault. The valid arguments are:
      
       - "always" 	L1D cache flush on every VMENTER.
       - "cond"	Conditional L1D cache flush, explained below
       - "never"	Disable the L1D cache flush mitigation
      
      "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
      between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
      interesting information into L1D which might exploited.
      
      [ tglx: Split out from a larger patch ]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a399477e
    • K
      x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present · 26acfb66
      Konrad Rzeszutek Wilk 提交于
      If the L1TF CPU bug is present we allow the KVM module to be loaded as the
      major of users that use Linux and KVM have trusted guests and do not want a
      broken setup.
      
      Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
      such they are the ones that should set nosmt to one.
      
      Setting 'nosmt' means that the system administrator also needs to disable
      SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
      parameter, or via the /sys/devices/system/cpu/smt/control. See commit
      05736e4a ("cpu/hotplug: Provide knobs to control SMT").
      
      Other mitigations are to use task affinity, cpu sets, interrupt binding,
      etc - anything to make sure that _only_ the same guests vCPUs are running
      on sibling threads.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      26acfb66
  19. 04 7月, 2018 1 次提交
    • C
      usercopy: Allow boot cmdline disabling of hardening · b5cb15d9
      Chris von Recklinghausen 提交于
      Enabling HARDENED_USERCOPY may cause measurable regressions in networking
      performance: up to 8% under UDP flood.
      
      I ran a small packet UDP flood using pktgen vs. a host b2b connected. On
      the receiver side the UDP packets are processed by a simple user space
      process that just reads and drops them:
      
      https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
      
      Not very useful from a functional PoV, but it helps to pin-point
      bottlenecks in the networking stack.
      
      When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
      regression in the receive tput, compared to the same kernel without this
      option enabled.
      
      With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
      cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).
      
      The call-chain is:
      
      __GI___libc_recvfrom
      entry_SYSCALL_64_after_hwframe
      do_syscall_64
      __x64_sys_recvfrom
      __sys_recvfrom
      inet_recvmsg
      udp_recvmsg
      __check_object_size
      
      udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
      calls check_copy_size() (again, inlined).
      
      A generic distro may want to enable HARDENED_USERCOPY in their default
      kernel config, but at the same time, such distro may want to be able to
      avoid the performance penalties in with the default configuration and
      disable the stricter check on a per-boot basis.
      
      This change adds a boot parameter that conditionally disables
      HARDENED_USERCOPY via "hardened_usercopy=off".
      Signed-off-by: NChris von Recklinghausen <crecklin@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      b5cb15d9
  20. 02 7月, 2018 1 次提交
    • T
      Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
      Thomas Gleixner 提交于
      Dave Hansen reported, that it's outright dangerous to keep SMT siblings
      disabled completely so they are stuck in the BIOS and wait for SIPI.
      
      The reason is that Machine Check Exceptions are broadcasted to siblings and
      the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
      logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
      reboots the machine. The MCE chapter in the SDM contains the following
      blurb:
      
          Because the logical processors within a physical package are tightly
          coupled with respect to shared hardware resources, both logical
          processors are notified of machine check errors that occur within a
          given physical processor. If machine-check exceptions are enabled when
          a fatal error is reported, all the logical processors within a physical
          package are dispatched to the machine-check exception handler. If
          machine-check exceptions are disabled, the logical processors enter the
          shutdown state and assert the IERR# signal. When enabling machine-check
          exceptions, the MCE flag in control register CR4 should be set for each
          logical processor.
      
      Reverting the commit which ignores siblings at enumeration time solves only
      half of the problem. The core cpuhotplug logic needs to be adjusted as
      well.
      
      This thoughtful engineered mechanism also turns the boot process on all
      Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
      before the secondary CPUs are brought up. Depending on the number of
      physical cores the window in which this situation can happen is smaller or
      larger. On a HSW-EX it's about 750ms:
      
      MCE is enabled on the boot CPU:
      
      [    0.244017] mce: CPU supports 22 MCE banks
      
      The corresponding sibling #72 boots:
      
      [    1.008005] .... node  #0, CPUs:    #72
      
      That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
      between these two points the machine is going to shutdown. At least it's a
      known safe state.
      
      It's obvious that the early boot can be hit by an MCE as well and then runs
      into the same situation because MCEs are not yet enabled on the boot CPU.
      But after enabling them on the boot CPU, it does not make any sense to
      prevent the kernel from recovering.
      
      Adjust the nosmt kernel parameter documentation as well.
      
      Reverts: 2207def7 ("x86/apic: Ignore secondary threads if nosmt=force")
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Luck <tony.luck@intel.com>
      506a66f3
  21. 30 6月, 2018 1 次提交
  22. 21 6月, 2018 1 次提交
    • T
      cpu/hotplug: Provide knobs to control SMT · 05736e4a
      Thomas Gleixner 提交于
      Provide a command line and a sysfs knob to control SMT.
      
      The command line options are:
      
       'nosmt':	Enumerate secondary threads, but do not online them
       		
       'nosmt=force': Ignore secondary threads completely during enumeration
       		via MP table and ACPI/MADT.
      
      The sysfs control file has the following states (read/write):
      
       'on':		 SMT is enabled. Secondary threads can be freely onlined
       'off':		 SMT is disabled. Secondary threads, even if enumerated
       		 cannot be onlined
       'forceoff':	 SMT is permanentely disabled. Writes to the control
       		 file are rejected.
       'notsupported': SMT is not supported by the CPU
      
      The command line option 'nosmt' sets the sysfs control to 'off'. This
      can be changed to 'on' to reenable SMT during runtime.
      
      The command line option 'nosmt=force' sets the sysfs control to
      'forceoff'. This cannot be changed during runtime.
      
      When SMT is 'on' and the control file is changed to 'off' then all online
      secondary threads are offlined and attempts to online a secondary thread
      later on are rejected.
      
      When SMT is 'off' and the control file is changed to 'on' then secondary
      threads can be onlined again. The 'off' -> 'on' transition does not
      automatically online the secondary threads.
      
      When the control file is set to 'forceoff', the behaviour is the same as
      setting it to 'off', but the operation is irreversible and later writes to
      the control file are rejected.
      
      When the control status is 'notsupported' then writes to the control file
      are rejected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      05736e4a
  23. 16 6月, 2018 2 次提交
  24. 01 6月, 2018 1 次提交
  25. 29 5月, 2018 1 次提交
  26. 28 5月, 2018 1 次提交
  27. 21 5月, 2018 1 次提交
  28. 19 5月, 2018 1 次提交
  29. 14 5月, 2018 1 次提交
  30. 11 5月, 2018 1 次提交
    • G
      PCI: Add "pci=noats" boot parameter · cef74409
      Gil Kupfer 提交于
      Adds a "pci=noats" boot parameter.  When supplied, all ATS related
      functions fail immediately and the IOMMU is configured to not use
      device-IOTLB.
      
      Any function that checks for ATS capabilities directly against the devices
      should also check this flag.  Currently, such functions exist only in IOMMU
      drivers, and they are covered by this patch.
      
      The motivation behind this patch is the existence of malicious devices.
      Lots of research has been done about how to use the IOMMU as protection
      from such devices.  When ATS is supported, any I/O device can access any
      physical address by faking device-IOTLB entries.  Adding the ability to
      ignore these entries lets sysadmins enhance system security.
      Signed-off-by: NGil Kupfer <gilkup@cs.technion.ac.il>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NJoerg Roedel <jroedel@suse.de>
      cef74409