1. 26 4月, 2019 1 次提交
  2. 10 1月, 2019 1 次提交
  3. 06 12月, 2018 4 次提交
    • T
      x86/speculation: Provide IBPB always command line options · 9f3baace
      Thomas Gleixner 提交于
      commit 55a974021ec952ee460dc31ca08722158639de72 upstream
      
      Provide the possibility to enable IBPB always in combination with 'prctl'
      and 'seccomp'.
      
      Add the extra command line options and rework the IBPB selection to
      evaluate the command instead of the mode selected by the STIPB switch case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f3baace
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354
      Thomas Gleixner 提交于
      commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ec2354
    • T
      x86/speculation: Enable prctl mode for spectre_v2_user · 7b62ef14
      Thomas Gleixner 提交于
      commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream
      
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b62ef14
    • T
      x86/speculation: Add command line control for indirect branch speculation · 71187543
      Thomas Gleixner 提交于
      commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream
      
      Add command line control for user space indirect branch speculation
      mitigations. The new option is: spectre_v2_user=
      
      The initial options are:
      
          -  on:   Unconditionally enabled
          - off:   Unconditionally disabled
          -auto:   Kernel selects mitigation (default off for now)
      
      When the spectre_v2= command line argument is either 'on' or 'off' this
      implies that the application to application control follows that state even
      if a contradicting spectre_v2_user= argument is supplied.
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71187543
  4. 01 12月, 2018 2 次提交
    • W
      Documentation/security-bugs: Postpone fix publication in exceptional cases · bcec3b85
      Will Deacon 提交于
      commit 544b03da39e2d7b4961d3163976ed4bfb1fac509 upstream.
      
      At the request of the reporter, the Linux kernel security team offers to
      postpone the publishing of a fix for up to 5 business days from the date
      of a report.
      
      While it is generally undesirable to keep a fix private after it has
      been developed, this short window is intended to allow distributions to
      package the fix into their kernel builds and permits early inclusion of
      the security team in the case of a co-ordinated disclosure with other
      parties. Unfortunately, discussions with major Linux distributions and
      cloud providers has revealed that 5 business days is not sufficient to
      achieve either of these two goals.
      
      As an example, cloud providers need to roll out KVM security fixes to a
      global fleet of hosts with sufficient early ramp-up and monitoring. An
      end-to-end timeline of less than two weeks dramatically cuts into the
      amount of early validation and increases the chance of guest-visible
      regressions.
      
      The consequence of this timeline mismatch is that security issues are
      commonly fixed without the involvement of the Linux kernel security team
      and are instead analysed and addressed by an ad-hoc group of developers
      across companies contributing to Linux. In some cases, mainline (and
      therefore the official stable kernels) can be left to languish for
      extended periods of time. This undermines the Linux kernel security
      process and puts upstream developers in a difficult position should they
      find themselves involved with an undisclosed security problem that they
      are unable to report due to restrictions from their employer.
      
      To accommodate the needs of these users of the Linux kernel and
      encourage them to engage with the Linux security team when security
      issues are first uncovered, extend the maximum period for which fixes
      may be delayed to 7 calendar days, or 14 calendar days in exceptional
      cases, where the logistics of QA and large scale rollouts specifically
      need to be accommodated. This brings parity with the linux-distros@
      maximum embargo period of 14 calendar days.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Amit Shah <aams@amazon.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Co-developed-by: NThomas Gleixner <tglx@linutronix.de>
      Co-developed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcec3b85
    • W
      Documentation/security-bugs: Clarify treatment of embargoed information · 160a390a
      Will Deacon 提交于
      commit 14fdc2c5318ae420e68496975f48dc1dbef52649 upstream.
      
      The Linux kernel security team has been accused of rejecting the idea of
      security embargoes. This is incorrect, and could dissuade people from
      reporting security issues to us under the false assumption that the
      issue would leak prematurely.
      
      Clarify the handling of embargoed information in our process
      documentation.
      Co-developed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      160a390a
  5. 27 11月, 2018 2 次提交
    • K
      USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub · ed8acd13
      Kai-Heng Feng 提交于
      commit 781f0766cc41a9dd2e5d118ef4b1d5d89430257b upstream.
      
      Devices connected under Terminus Technology Inc. Hub (1a40:0101) may
      fail to work after the system resumes from suspend:
      [  206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd
      [  206.143691] usb 3-2.4: device descriptor read/64, error -32
      [  206.351671] usb 3-2.4: device descriptor read/64, error -32
      
      Info for this hub:
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480 MxCh= 4
      D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=1a40 ProdID=0101 Rev=01.11
      S:  Product=USB 2.0 Hub
      C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
      
      Some expirements indicate that the USB devices connected to the hub are
      innocent, it's the hub itself is to blame. The hub needs extra delay
      time after it resets its port.
      
      Hence wait for extra delay, if the device is connected to this quirky
      hub.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed8acd13
    • F
      x86/earlyprintk: Add a force option for pciserial device · 9f0e46bf
      Feng Tang 提交于
      [ Upstream commit d2266bbfa9e3e32e3b642965088ca461bd24a94f ]
      
      The "pciserial" earlyprintk variant helps much on many modern x86
      platforms, but unfortunately there are still some platforms with PCI
      UART devices which have the wrong PCI class code. In that case, the
      current class code check does not allow for them to be used for logging.
      
      Add a sub-option "force" which overrides the class code check and thus
      the use of such device can be enforced.
      
       [ bp: massage formulations. ]
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H Peter Anvin <hpa@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thymo van Beers <thymovanbeers@gmail.com>
      Cc: alan@linux.intel.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      9f0e46bf
  6. 14 9月, 2018 1 次提交
    • M
      xen/balloon: add runtime control for scrubbing ballooned out pages · 197ecb38
      Marek Marczykowski-Górecki 提交于
      Scrubbing pages on initial balloon down can take some time, especially
      in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
      started with memory= significantly lower than maxmem=, all the extra
      pages will be scrubbed before returning to Xen. But since most of them
      weren't used at all at that point, Xen needs to populate them first
      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
      this slows down the guest boot by 15-30s with just 1.5GB needed to be
      returned to Xen.
      
      Add runtime parameter to enable/disable it, to allow initially disabling
      scrubbing, then enable it back during boot (for example in initramfs).
      Such usage relies on assumption that a) most pages ballooned out during
      initial boot weren't used at all, and b) even if they were, very few
      secrets are in the guest at that time (before any serious userspace
      kicks in).
      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
      enabled by default), controlling default value for the new runtime
      switch.
      Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      197ecb38
  7. 02 9月, 2018 1 次提交
  8. 23 8月, 2018 2 次提交
    • R
      mm, oom: introduce memory.oom.group · 3d8b38eb
      Roman Gushchin 提交于
      For some workloads an intervention from the OOM killer can be painful.
      Killing a random task can bring the workload into an inconsistent state.
      
      Historically, there are two common solutions for this
      problem:
      1) enabling panic_on_oom,
      2) using a userspace daemon to monitor OOMs and kill
         all outstanding processes.
      
      Both approaches have their downsides: rebooting on each OOM is an obvious
      waste of capacity, and handling all in userspace is tricky and requires a
      userspace agent, which will monitor all cgroups for OOMs.
      
      In most cases an in-kernel after-OOM cleaning-up mechanism can eliminate
      the necessity of enabling panic_on_oom.  Also, it can simplify the cgroup
      management for userspace applications.
      
      This commit introduces a new knob for cgroup v2 memory controller:
      memory.oom.group.  The knob determines whether the cgroup should be
      treated as an indivisible workload by the OOM killer.  If set, all tasks
      belonging to the cgroup or to its descendants (if the memory cgroup is not
      a leaf cgroup) are killed together or not at all.
      
      To determine which cgroup has to be killed, we do traverse the cgroup
      hierarchy from the victim task's cgroup up to the OOMing cgroup (or root)
      and looking for the highest-level cgroup with memory.oom.group set.
      
      Tasks with the OOM protection (oom_score_adj set to -1000) are treated as
      an exception and are never killed.
      
      This patch doesn't change the OOM victim selection algorithm.
      
      Link: http://lkml.kernel.org/r/20180802003201.817-4-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d8b38eb
    • K
      mm: clarify CONFIG_PAGE_POISONING and usage · 8c9a134c
      Kees Cook 提交于
      The Kconfig text for CONFIG_PAGE_POISONING doesn't mention that it has to
      be enabled explicitly.  This updates the documentation for that and adds a
      note about CONFIG_PAGE_POISONING to the "page_poison" command line docs.
      While here, change description of CONFIG_PAGE_POISONING_ZERO too, as it's
      not "random" data, but rather the fixed debugging value that would be used
      when not zeroing.  Additionally removes a stray "bool" in the Kconfig.
      
      Link: http://lkml.kernel.org/r/20180725223832.GA43733@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c9a134c
  9. 18 8月, 2018 2 次提交
  10. 10 8月, 2018 3 次提交
    • L
      PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support · aaca43fd
      Logan Gunthorpe 提交于
      To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
      disable the ACS redirect bits for select PCI bridges.  The bridges must be
      selected before the devices are discovered by the kernel and the IOMMU
      groups created.  Therefore, add a kernel command line parameter to specify
      devices which must have their ACS bits disabled.
      
      The new parameter takes a list of devices separated by a semicolon.  Each
      device specified will have its ACS redirect bits disabled.  This is
      similar to the existing 'resource_alignment' parameter.
      
      The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
      Egress Control bits are disabled, which is sufficient to always allow
      passing P2P traffic uninterrupted.  The bits are set after the kernel
      (optionally) enables the ACS bits itself.  It is also done regardless of
      whether the kernel or platform firmware sets the bits.
      
      If the user tries to disable the ACS redirect for a device without the ACS
      capability, print a warning to dmesg.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: reorder to add the generic code first and move the
      device-specific quirk to subsequent patches]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      aaca43fd
    • L
      PCI: Allow specifying devices using a base bus and path of devfns · 45db3370
      Logan Gunthorpe 提交于
      When specifying PCI devices on the kernel command line using a
      bus/device/function address, bus numbers can change when adding or
      replacing a device, changing motherboard firmware, or applying kernel
      parameters like "pci=assign-buses".  When bus numbers change, it's likely
      the command line tweak will be applied to the wrong device.
      
      Therefore, it is useful to be able to specify devices with a base bus
      number and the path of devfns needed to get to it, similar to the "device
      scope" structure in the Intel VT-d spec, Section 8.3.1.
      
      Thus, we add an option to specify devices in the following format:
      
        [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
      
      The path can be any segment within the PCI hierarchy of any length and
      determined through the use of 'lspci -t'.  When specified this way, it is
      less likely that a renumbered bus will result in a valid device
      specification and the tweak won't be applied to the wrong device.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      45db3370
    • L
      PCI: Make specifying PCI devices in kernel parameters reusable · 07d8d7e5
      Logan Gunthorpe 提交于
      Separate out the code to match a PCI device with a string (typically
      originating from a kernel parameter) from the
      pci_specified_resource_alignment() function into its own helper function.
      
      While we are at it, this change fixes the kernel style of the function
      (fixing a number of long lines and extra parentheses).
      
      Additionally, make the analogous change to the kernel parameter
      documentation: Separate the description of how to specify a PCI device
      into its own section at the head of the "pci=" parameter.
      
      This patch should have no functional alterations.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      07d8d7e5
  11. 07 8月, 2018 1 次提交
  12. 05 8月, 2018 2 次提交
  13. 02 8月, 2018 1 次提交
  14. 27 7月, 2018 1 次提交
    • O
      iommu: Add config option to set passthrough as default · 58d11317
      Olof Johansson 提交于
      This allows the default behavior to be controlled by a kernel config
      option instead of changing the commandline for the kernel to include
      "iommu.passthrough=on" or "iommu=pt" on machines where this is desired.
      
      Likewise, for machines where this config option is enabled, it can be
      disabled at boot time with "iommu.passthrough=off" or "iommu=nopt".
      
      Also corrected iommu=pt documentation for IA-64, since it has no code that
      parses iommu= at all.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      58d11317
  15. 21 7月, 2018 1 次提交
  16. 20 7月, 2018 2 次提交
    • T
      Documentation/l1tf: Fix typos · 1949f9f4
      Tony Luck 提交于
      Fix spelling and other typos
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1949f9f4
    • P
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin 提交于
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
  17. 18 7月, 2018 2 次提交
  18. 13 7月, 2018 3 次提交
    • T
      Documentation: Add section about CPU vulnerabilities · 3ec8ce5d
      Thomas Gleixner 提交于
      Add documentation for the L1TF vulnerability and the mitigation mechanisms:
      
        - Explain the problem and risks
        - Document the mitigation mechanisms
        - Document the command line controls
        - Document the sysfs files
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
      3ec8ce5d
    • J
      x86/bugs, kvm: Introduce boot-time control of L1TF mitigations · d90a7a0e
      Jiri Kosina 提交于
      Introduce the 'l1tf=' kernel command line option to allow for boot-time
      switching of mitigation that is used on processors affected by L1TF.
      
      The possible values are:
      
        full
      	Provides all available mitigations for the L1TF vulnerability. Disables
      	SMT and enables all mitigations in the hypervisors. SMT control via
      	/sys/devices/system/cpu/smt/control is still possible after boot.
      	Hypervisors will issue a warning when the first VM is started in
      	a potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        full,force
      	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
      	command line option. sysfs control of SMT and the hypervisor flush
      	control is disabled.
      
        flush
      	Leaves SMT enabled and enables the conditional hypervisor mitigation.
      	Hypervisors will issue a warning when the first VM is started in a
      	potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        flush,nosmt
      	Disables SMT and enables the conditional hypervisor mitigation. SMT
      	control via /sys/devices/system/cpu/smt/control is still possible
      	after boot. If SMT is reenabled or flushing disabled at runtime
      	hypervisors will issue a warning.
      
        flush,nowarn
      	Same as 'flush', but hypervisors will not warn when
      	a VM is started in a potentially insecure configuration.
      
        off
      	Disables hypervisor mitigations and doesn't emit any warnings.
      
      Default is 'flush'.
      
      Let KVM adhere to these semantics, which means:
      
        - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
          			  possible.
      
        - 'l1tf=full'
        - 'l1tf-flush'
        - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
      			  SMT has been runtime enabled or L1D flushing
      			  has been run-time enabled
      			  
        - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.
        
        - 'l1tf=off'		: L1D flushes are not performed and no warnings
      			  are emitted.
      
      KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
      module parameter except when lt1f=full,force is set.
      
      This makes KVM's private 'nosmt' option redundant, and as it is a bit
      non-systematic anyway (this is something to control globally, not on
      hypervisor level), remove that option.
      
      Add the missing Documentation entry for the l1tf vulnerability sysfs file
      while at it.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
      d90a7a0e
    • P
      rcutorture: Change units of onoff_interval to jiffies · 028be12b
      Paul E. McKenney 提交于
      Some RCU bugs have been sensitive to the frequency of CPU-hotplug
      operations, which have been gradually increased over time.  But this
      frequency is now at the one-second lower limit that can be specified using
      the rcutorture.onoff_interval kernel parameter.  This commit therefore
      changes the units of rcutorture.onoff_interval from seconds to jiffies,
      and also sets the value specified for this kernel parameter in the TREE03
      rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      028be12b
  19. 11 7月, 2018 2 次提交
  20. 10 7月, 2018 1 次提交
    • R
      driver core: allow stopping deferred probe after init · 25b4e70d
      Rob Herring 提交于
      Deferred probe will currently wait forever on dependent devices to probe,
      but sometimes a driver will never exist. It's also not always critical for
      a driver to exist. Platforms can rely on default configuration from the
      bootloader or reset defaults for things such as pinctrl and power domains.
      This is often the case with initial platform support until various drivers
      get enabled. There's at least 2 scenarios where deferred probe can render
      a platform broken. Both involve using a DT which has more devices and
      dependencies than the kernel supports. The 1st case is a driver may be
      disabled in the kernel config. The 2nd case is the kernel version may
      simply not have the dependent driver. This can happen if using a newer DT
      (provided by firmware perhaps) with a stable kernel version. Deferred
      probe issues can be difficult to debug especially if the console has
      dependencies or userspace fails to boot to a shell.
      
      There are also cases like IOMMUs where only built-in drivers are
      supported, so deferring probe after initcalls is not needed. The IOMMU
      subsystem implemented its own mechanism to handle this using OF_DECLARE
      linker sections.
      
      This commit adds makes ending deferred probe conditional on initcalls
      being completed or a debug timeout. Subsystems or drivers may opt-in by
      calling driver_deferred_probe_check_init_done() instead of
      unconditionally returning -EPROBE_DEFER. They may use additional
      information from DT or kernel's config to decide whether to continue to
      defer probe or not.
      
      The timeout mechanism is intended for debug purposes and WARNs loudly.
      The remaining deferred probe pending list will also be dumped after the
      timeout. Not that this timeout won't work for the console which needs
      to be enabled before userspace starts. However, if the console's
      dependencies are resolved, then the kernel log will be printed (as
      opposed to no output).
      
      Cc: Alexander Graf <agraf@suse.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25b4e70d
  21. 09 7月, 2018 1 次提交
  22. 06 7月, 2018 1 次提交
  23. 05 7月, 2018 2 次提交
    • K
      x86/KVM/VMX: Add module argument for L1TF mitigation · a399477e
      Konrad Rzeszutek Wilk 提交于
      Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
      L1 terminal fault. The valid arguments are:
      
       - "always" 	L1D cache flush on every VMENTER.
       - "cond"	Conditional L1D cache flush, explained below
       - "never"	Disable the L1D cache flush mitigation
      
      "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
      between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
      interesting information into L1D which might exploited.
      
      [ tglx: Split out from a larger patch ]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a399477e
    • K
      x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present · 26acfb66
      Konrad Rzeszutek Wilk 提交于
      If the L1TF CPU bug is present we allow the KVM module to be loaded as the
      major of users that use Linux and KVM have trusted guests and do not want a
      broken setup.
      
      Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
      such they are the ones that should set nosmt to one.
      
      Setting 'nosmt' means that the system administrator also needs to disable
      SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
      parameter, or via the /sys/devices/system/cpu/smt/control. See commit
      05736e4a ("cpu/hotplug: Provide knobs to control SMT").
      
      Other mitigations are to use task affinity, cpu sets, interrupt binding,
      etc - anything to make sure that _only_ the same guests vCPUs are running
      on sibling threads.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      26acfb66
  24. 04 7月, 2018 1 次提交
    • C
      usercopy: Allow boot cmdline disabling of hardening · b5cb15d9
      Chris von Recklinghausen 提交于
      Enabling HARDENED_USERCOPY may cause measurable regressions in networking
      performance: up to 8% under UDP flood.
      
      I ran a small packet UDP flood using pktgen vs. a host b2b connected. On
      the receiver side the UDP packets are processed by a simple user space
      process that just reads and drops them:
      
      https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
      
      Not very useful from a functional PoV, but it helps to pin-point
      bottlenecks in the networking stack.
      
      When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
      regression in the receive tput, compared to the same kernel without this
      option enabled.
      
      With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
      cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).
      
      The call-chain is:
      
      __GI___libc_recvfrom
      entry_SYSCALL_64_after_hwframe
      do_syscall_64
      __x64_sys_recvfrom
      __sys_recvfrom
      inet_recvmsg
      udp_recvmsg
      __check_object_size
      
      udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
      calls check_copy_size() (again, inlined).
      
      A generic distro may want to enable HARDENED_USERCOPY in their default
      kernel config, but at the same time, such distro may want to be able to
      avoid the performance penalties in with the default configuration and
      disable the stricter check on a per-boot basis.
      
      This change adds a boot parameter that conditionally disables
      HARDENED_USERCOPY via "hardened_usercopy=off".
      Signed-off-by: NChris von Recklinghausen <crecklin@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      b5cb15d9