1. 23 8月, 2018 1 次提交
  2. 18 8月, 2018 2 次提交
  3. 10 8月, 2018 3 次提交
    • L
      PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support · aaca43fd
      Logan Gunthorpe 提交于
      To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
      disable the ACS redirect bits for select PCI bridges.  The bridges must be
      selected before the devices are discovered by the kernel and the IOMMU
      groups created.  Therefore, add a kernel command line parameter to specify
      devices which must have their ACS bits disabled.
      
      The new parameter takes a list of devices separated by a semicolon.  Each
      device specified will have its ACS redirect bits disabled.  This is
      similar to the existing 'resource_alignment' parameter.
      
      The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
      Egress Control bits are disabled, which is sufficient to always allow
      passing P2P traffic uninterrupted.  The bits are set after the kernel
      (optionally) enables the ACS bits itself.  It is also done regardless of
      whether the kernel or platform firmware sets the bits.
      
      If the user tries to disable the ACS redirect for a device without the ACS
      capability, print a warning to dmesg.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: reorder to add the generic code first and move the
      device-specific quirk to subsequent patches]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      aaca43fd
    • L
      PCI: Allow specifying devices using a base bus and path of devfns · 45db3370
      Logan Gunthorpe 提交于
      When specifying PCI devices on the kernel command line using a
      bus/device/function address, bus numbers can change when adding or
      replacing a device, changing motherboard firmware, or applying kernel
      parameters like "pci=assign-buses".  When bus numbers change, it's likely
      the command line tweak will be applied to the wrong device.
      
      Therefore, it is useful to be able to specify devices with a base bus
      number and the path of devfns needed to get to it, similar to the "device
      scope" structure in the Intel VT-d spec, Section 8.3.1.
      
      Thus, we add an option to specify devices in the following format:
      
        [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
      
      The path can be any segment within the PCI hierarchy of any length and
      determined through the use of 'lspci -t'.  When specified this way, it is
      less likely that a renumbered bus will result in a valid device
      specification and the tweak won't be applied to the wrong device.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      45db3370
    • L
      PCI: Make specifying PCI devices in kernel parameters reusable · 07d8d7e5
      Logan Gunthorpe 提交于
      Separate out the code to match a PCI device with a string (typically
      originating from a kernel parameter) from the
      pci_specified_resource_alignment() function into its own helper function.
      
      While we are at it, this change fixes the kernel style of the function
      (fixing a number of long lines and extra parentheses).
      
      Additionally, make the analogous change to the kernel parameter
      documentation: Separate the description of how to specify a PCI device
      into its own section at the head of the "pci=" parameter.
      
      This patch should have no functional alterations.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: use "device" instead of "slot" in documentation since that's the
      usual language in the PCI specs]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NStephen Bates <sbates@raithlin.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      07d8d7e5
  4. 07 8月, 2018 1 次提交
  5. 05 8月, 2018 2 次提交
  6. 02 8月, 2018 1 次提交
  7. 21 7月, 2018 1 次提交
  8. 20 7月, 2018 2 次提交
    • T
      Documentation/l1tf: Fix typos · 1949f9f4
      Tony Luck 提交于
      Fix spelling and other typos
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1949f9f4
    • P
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin 提交于
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
  9. 18 7月, 2018 2 次提交
  10. 13 7月, 2018 3 次提交
    • T
      Documentation: Add section about CPU vulnerabilities · 3ec8ce5d
      Thomas Gleixner 提交于
      Add documentation for the L1TF vulnerability and the mitigation mechanisms:
      
        - Explain the problem and risks
        - Document the mitigation mechanisms
        - Document the command line controls
        - Document the sysfs files
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
      3ec8ce5d
    • J
      x86/bugs, kvm: Introduce boot-time control of L1TF mitigations · d90a7a0e
      Jiri Kosina 提交于
      Introduce the 'l1tf=' kernel command line option to allow for boot-time
      switching of mitigation that is used on processors affected by L1TF.
      
      The possible values are:
      
        full
      	Provides all available mitigations for the L1TF vulnerability. Disables
      	SMT and enables all mitigations in the hypervisors. SMT control via
      	/sys/devices/system/cpu/smt/control is still possible after boot.
      	Hypervisors will issue a warning when the first VM is started in
      	a potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        full,force
      	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
      	command line option. sysfs control of SMT and the hypervisor flush
      	control is disabled.
      
        flush
      	Leaves SMT enabled and enables the conditional hypervisor mitigation.
      	Hypervisors will issue a warning when the first VM is started in a
      	potentially insecure configuration, i.e. SMT enabled or L1D flush
      	disabled.
      
        flush,nosmt
      	Disables SMT and enables the conditional hypervisor mitigation. SMT
      	control via /sys/devices/system/cpu/smt/control is still possible
      	after boot. If SMT is reenabled or flushing disabled at runtime
      	hypervisors will issue a warning.
      
        flush,nowarn
      	Same as 'flush', but hypervisors will not warn when
      	a VM is started in a potentially insecure configuration.
      
        off
      	Disables hypervisor mitigations and doesn't emit any warnings.
      
      Default is 'flush'.
      
      Let KVM adhere to these semantics, which means:
      
        - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
          			  possible.
      
        - 'l1tf=full'
        - 'l1tf-flush'
        - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
      			  SMT has been runtime enabled or L1D flushing
      			  has been run-time enabled
      			  
        - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.
        
        - 'l1tf=off'		: L1D flushes are not performed and no warnings
      			  are emitted.
      
      KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
      module parameter except when lt1f=full,force is set.
      
      This makes KVM's private 'nosmt' option redundant, and as it is a bit
      non-systematic anyway (this is something to control globally, not on
      hypervisor level), remove that option.
      
      Add the missing Documentation entry for the l1tf vulnerability sysfs file
      while at it.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
      d90a7a0e
    • P
      rcutorture: Change units of onoff_interval to jiffies · 028be12b
      Paul E. McKenney 提交于
      Some RCU bugs have been sensitive to the frequency of CPU-hotplug
      operations, which have been gradually increased over time.  But this
      frequency is now at the one-second lower limit that can be specified using
      the rcutorture.onoff_interval kernel parameter.  This commit therefore
      changes the units of rcutorture.onoff_interval from seconds to jiffies,
      and also sets the value specified for this kernel parameter in the TREE03
      rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      028be12b
  11. 11 7月, 2018 2 次提交
  12. 10 7月, 2018 1 次提交
    • R
      driver core: allow stopping deferred probe after init · 25b4e70d
      Rob Herring 提交于
      Deferred probe will currently wait forever on dependent devices to probe,
      but sometimes a driver will never exist. It's also not always critical for
      a driver to exist. Platforms can rely on default configuration from the
      bootloader or reset defaults for things such as pinctrl and power domains.
      This is often the case with initial platform support until various drivers
      get enabled. There's at least 2 scenarios where deferred probe can render
      a platform broken. Both involve using a DT which has more devices and
      dependencies than the kernel supports. The 1st case is a driver may be
      disabled in the kernel config. The 2nd case is the kernel version may
      simply not have the dependent driver. This can happen if using a newer DT
      (provided by firmware perhaps) with a stable kernel version. Deferred
      probe issues can be difficult to debug especially if the console has
      dependencies or userspace fails to boot to a shell.
      
      There are also cases like IOMMUs where only built-in drivers are
      supported, so deferring probe after initcalls is not needed. The IOMMU
      subsystem implemented its own mechanism to handle this using OF_DECLARE
      linker sections.
      
      This commit adds makes ending deferred probe conditional on initcalls
      being completed or a debug timeout. Subsystems or drivers may opt-in by
      calling driver_deferred_probe_check_init_done() instead of
      unconditionally returning -EPROBE_DEFER. They may use additional
      information from DT or kernel's config to decide whether to continue to
      defer probe or not.
      
      The timeout mechanism is intended for debug purposes and WARNs loudly.
      The remaining deferred probe pending list will also be dumped after the
      timeout. Not that this timeout won't work for the console which needs
      to be enabled before userspace starts. However, if the console's
      dependencies are resolved, then the kernel log will be printed (as
      opposed to no output).
      
      Cc: Alexander Graf <agraf@suse.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25b4e70d
  13. 09 7月, 2018 1 次提交
  14. 06 7月, 2018 1 次提交
  15. 05 7月, 2018 2 次提交
    • K
      x86/KVM/VMX: Add module argument for L1TF mitigation · a399477e
      Konrad Rzeszutek Wilk 提交于
      Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
      L1 terminal fault. The valid arguments are:
      
       - "always" 	L1D cache flush on every VMENTER.
       - "cond"	Conditional L1D cache flush, explained below
       - "never"	Disable the L1D cache flush mitigation
      
      "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
      between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
      interesting information into L1D which might exploited.
      
      [ tglx: Split out from a larger patch ]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a399477e
    • K
      x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present · 26acfb66
      Konrad Rzeszutek Wilk 提交于
      If the L1TF CPU bug is present we allow the KVM module to be loaded as the
      major of users that use Linux and KVM have trusted guests and do not want a
      broken setup.
      
      Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
      such they are the ones that should set nosmt to one.
      
      Setting 'nosmt' means that the system administrator also needs to disable
      SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
      parameter, or via the /sys/devices/system/cpu/smt/control. See commit
      05736e4a ("cpu/hotplug: Provide knobs to control SMT").
      
      Other mitigations are to use task affinity, cpu sets, interrupt binding,
      etc - anything to make sure that _only_ the same guests vCPUs are running
      on sibling threads.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      26acfb66
  16. 04 7月, 2018 1 次提交
    • C
      usercopy: Allow boot cmdline disabling of hardening · b5cb15d9
      Chris von Recklinghausen 提交于
      Enabling HARDENED_USERCOPY may cause measurable regressions in networking
      performance: up to 8% under UDP flood.
      
      I ran a small packet UDP flood using pktgen vs. a host b2b connected. On
      the receiver side the UDP packets are processed by a simple user space
      process that just reads and drops them:
      
      https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
      
      Not very useful from a functional PoV, but it helps to pin-point
      bottlenecks in the networking stack.
      
      When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
      regression in the receive tput, compared to the same kernel without this
      option enabled.
      
      With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
      cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).
      
      The call-chain is:
      
      __GI___libc_recvfrom
      entry_SYSCALL_64_after_hwframe
      do_syscall_64
      __x64_sys_recvfrom
      __sys_recvfrom
      inet_recvmsg
      udp_recvmsg
      __check_object_size
      
      udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
      calls check_copy_size() (again, inlined).
      
      A generic distro may want to enable HARDENED_USERCOPY in their default
      kernel config, but at the same time, such distro may want to be able to
      avoid the performance penalties in with the default configuration and
      disable the stricter check on a per-boot basis.
      
      This change adds a boot parameter that conditionally disables
      HARDENED_USERCOPY via "hardened_usercopy=off".
      Signed-off-by: NChris von Recklinghausen <crecklin@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      b5cb15d9
  17. 02 7月, 2018 1 次提交
    • T
      Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
      Thomas Gleixner 提交于
      Dave Hansen reported, that it's outright dangerous to keep SMT siblings
      disabled completely so they are stuck in the BIOS and wait for SIPI.
      
      The reason is that Machine Check Exceptions are broadcasted to siblings and
      the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
      logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
      reboots the machine. The MCE chapter in the SDM contains the following
      blurb:
      
          Because the logical processors within a physical package are tightly
          coupled with respect to shared hardware resources, both logical
          processors are notified of machine check errors that occur within a
          given physical processor. If machine-check exceptions are enabled when
          a fatal error is reported, all the logical processors within a physical
          package are dispatched to the machine-check exception handler. If
          machine-check exceptions are disabled, the logical processors enter the
          shutdown state and assert the IERR# signal. When enabling machine-check
          exceptions, the MCE flag in control register CR4 should be set for each
          logical processor.
      
      Reverting the commit which ignores siblings at enumeration time solves only
      half of the problem. The core cpuhotplug logic needs to be adjusted as
      well.
      
      This thoughtful engineered mechanism also turns the boot process on all
      Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
      before the secondary CPUs are brought up. Depending on the number of
      physical cores the window in which this situation can happen is smaller or
      larger. On a HSW-EX it's about 750ms:
      
      MCE is enabled on the boot CPU:
      
      [    0.244017] mce: CPU supports 22 MCE banks
      
      The corresponding sibling #72 boots:
      
      [    1.008005] .... node  #0, CPUs:    #72
      
      That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
      between these two points the machine is going to shutdown. At least it's a
      known safe state.
      
      It's obvious that the early boot can be hit by an MCE as well and then runs
      into the same situation because MCEs are not yet enabled on the boot CPU.
      But after enabling them on the boot CPU, it does not make any sense to
      prevent the kernel from recovering.
      
      Adjust the nosmt kernel parameter documentation as well.
      
      Reverts: 2207def7 ("x86/apic: Ignore secondary threads if nosmt=force")
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Luck <tony.luck@intel.com>
      506a66f3
  18. 30 6月, 2018 1 次提交
  19. 29 6月, 2018 1 次提交
  20. 27 6月, 2018 2 次提交
  21. 21 6月, 2018 2 次提交
    • T
      cpu/hotplug: Provide knobs to control SMT · 05736e4a
      Thomas Gleixner 提交于
      Provide a command line and a sysfs knob to control SMT.
      
      The command line options are:
      
       'nosmt':	Enumerate secondary threads, but do not online them
       		
       'nosmt=force': Ignore secondary threads completely during enumeration
       		via MP table and ACPI/MADT.
      
      The sysfs control file has the following states (read/write):
      
       'on':		 SMT is enabled. Secondary threads can be freely onlined
       'off':		 SMT is disabled. Secondary threads, even if enumerated
       		 cannot be onlined
       'forceoff':	 SMT is permanentely disabled. Writes to the control
       		 file are rejected.
       'notsupported': SMT is not supported by the CPU
      
      The command line option 'nosmt' sets the sysfs control to 'off'. This
      can be changed to 'on' to reenable SMT during runtime.
      
      The command line option 'nosmt=force' sets the sysfs control to
      'forceoff'. This cannot be changed during runtime.
      
      When SMT is 'on' and the control file is changed to 'off' then all online
      secondary threads are offlined and attempts to online a secondary thread
      later on are rejected.
      
      When SMT is 'off' and the control file is changed to 'on' then secondary
      threads can be onlined again. The 'off' -> 'on' transition does not
      automatically online the secondary threads.
      
      When the control file is set to 'forceoff', the behaviour is the same as
      setting it to 'off', but the operation is irreversible and later writes to
      the control file are rejected.
      
      When the control status is 'notsupported' then writes to the control file
      are rejected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      05736e4a
    • R
      Documentation: intel_pstate: Fix typo · 7a0f9d1e
      Rafael J. Wysocki 提交于
      Fix a typo in the intel_pstate admin-guide documentation.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7a0f9d1e
  22. 16 6月, 2018 2 次提交
  23. 08 6月, 2018 4 次提交
  24. 07 6月, 2018 1 次提交