1. 20 9月, 2022 1 次提交
  2. 29 7月, 2022 1 次提交
    • F
      x86/traps: Handle #DB for bus lock · c8db5890
      Fenghua Yu 提交于
      mainline inclusion
      from mainline-v5.12-rc4
      commit ebb1064e
      category: feature
      bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C
      CVE: NA
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
      commit/?id=ebb1064e
      
      Intel-SIG: commit ebb1064e x86/traps: Handle #DB for bus lock
      
      --------------------------------
      
      Bus locks degrade performance for the whole system, not just for the CPU
      that requested the bus lock. Two CPU features "#AC for split lock" and
      "#DB for bus lock" provide hooks so that the operating system may choose
      one of several mitigation strategies.
      
      bus lock feature to cover additional situations with new options to
      mitigate.
      
      split_lock_detect=
      		#AC for split lock		#DB for bus lock
      
      off		Do nothing			Do nothing
      
      warn		Kernel OOPs			Warn once per task and
      		Warn once per task and		and continues to run.
      		disable future checking
      	 	When both features are
      		supported, warn in #AC
      
      fatal		Kernel OOPs			Send SIGBUS to user.
      		Send SIGBUS to user
      		When both features are
      		supported, fatal in #AC
      
      ratelimit:N	Do nothing			Limit bus lock rate to
      						N per second in the
      						current non-root user.
      
      Default option is "warn".
      
      Hardware only generates #DB for bus lock detect when CPL>0 to avoid
      nested #DB from multiple bus locks while the first #DB is being handled.
      So no need to handle #DB for bus lock detected in the kernel.
      
      while #AC for split lock is enabled by split lock detection bit 29 in
      TEST_CTRL MSR.
      
      Both breakpoint and bus lock in the same instruction can trigger one #DB.
      The bus lock is handled before the breakpoint in the #DB handler.
      
      Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
      the #DB handler right after reading DR6.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
      
      (cherry picked from commit ebb1064e)
      Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
      c8db5890
  3. 08 7月, 2022 1 次提交
  4. 06 7月, 2022 2 次提交
  5. 22 2月, 2022 2 次提交
  6. 29 12月, 2021 2 次提交
  7. 18 9月, 2020 1 次提交
  8. 10 9月, 2020 1 次提交
  9. 08 9月, 2020 2 次提交
  10. 18 8月, 2020 2 次提交
    • K
      perf/x86/intel: Support TopDown metrics on Ice Lake · 59a854e2
      Kan Liang 提交于
      Ice Lake supports the hardware TopDown metrics feature, which can free
      up the scarce GP counters.
      
      Update the event constraints for the metrics events. The metric counters
      do not exist, which are mapped to a dummy offset. The sharing between
      multiple users of the same metric without multiplexing is not allowed.
      
      Implement set_topdown_event_period for Ice Lake. The values in
      PERF_METRICS MSR are derived from the fixed counter 3. Both registers
      should start from zero.
      
      Implement update_topdown_event for Ice Lake. The metric is reported by
      multiplying the metric (fraction) with slots. To maintain accurate
      measurements, both registers are cleared for each update. The fixed
      counter 3 should always be cleared before the PERF_METRICS.
      
      Implement td_attr for the new metrics events and the new slots fixed
      counter. Make them visible to the perf user tools.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
      59a854e2
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
  11. 08 7月, 2020 1 次提交
  12. 02 7月, 2020 1 次提交
    • S
      cpufreq: intel_pstate: Allow enable/disable energy efficiency · ed7bde7a
      Srinivas Pandruvada 提交于
      By default intel_pstate the driver disables energy efficiency by setting
      MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
      This CPU model is also shared by Coffee Lake desktop CPUs. This allows
      these systems to reach maximum possible frequency. But this adds power
      penalty, which some customers don't want. They want some way to enable/
      disable dynamically.
      
      So, add an additional attribute "energy_efficiency" under
      /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
      to read and write bit 19 ("Disable Energy Efficiency Optimization") in
      the MSR IA32_POWER_CTL.
      
      This attribute is present in both HWP and non-HWP mode as this has an
      effect in both modes. Refer to Intel Software Developer's manual for
      details.
      
      The scope of this bit is package wide. Also these systems are single
      package systems. So read/write MSR on the current CPU is enough.
      
      The energy efficiency (EE) bit setting needs to be preserved during
      suspend/resume and CPU offline/online operation. To do this:
      - Restoring the EE setting from the cpufreq resume() callback, if there
      is change from the system default.
      - By default, don't disable EE from cpufreq init() callback for matching
      CPU models. Since the scope is package wide and is a single package
      system, move the disable EE calls from init() callback to
      intel_pstate_init() function, which is called only once.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ed7bde7a
  13. 22 6月, 2020 1 次提交
    • B
      x86/msr: Move the F15h MSRs where they belong · 99e40204
      Borislav Petkov 提交于
      1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      
      moved the three F15h power MSRs to the architectural list but that was
      wrong as they belong in the family 0x15 list. That also caused:
      
        In file included from trace/beauty/tracepoints/x86_msr.c:10:
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
          292 |  [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
              |                                             ^~~~~~~~~~~
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')
      
      due to MSR_F15H_PTSC ending up being defined twice. Move them where they
      belong and drop the duplicate.
      
      Also, drop the respective tools/ changes of the msr-index.h copy the
      above commit added because perf tool developers prefer to go through
      those changes themselves in order to figure out whether changes to the
      kernel headers would need additional handling in perf.
      
      Fixes: 1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
      99e40204
  14. 16 6月, 2020 1 次提交
  15. 28 5月, 2020 1 次提交
  16. 20 4月, 2020 1 次提交
    • M
      x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation · 7e5b3c26
      Mark Gross 提交于
      SRBDS is an MDS-like speculative side channel that can leak bits from the
      random number generator (RNG) across cores and threads. New microcode
      serializes the processor access during the execution of RDRAND and
      RDSEED. This ensures that the shared buffer is overwritten before it is
      released for reuse.
      
      While it is present on all affected CPU models, the microcode mitigation
      is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
      cases where TSX is not supported or has been disabled with TSX_CTRL.
      
      The mitigation is activated by default on affected processors and it
      increases latency for RDRAND and RDSEED instructions. Among other
      effects this will reduce throughput from /dev/urandom.
      
      * Enable administrator to configure the mitigation off when desired using
        either mitigations=off or srbds=off.
      
      * Export vulnerability status via sysfs
      
      * Rename file-scoped macros to apply for non-whitelist table initializations.
      
       [ bp: Massage,
         - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
         - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
         - flip check in cpu_set_bug_bits() to save an indentation level,
         - reflow comments.
         jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
         tglx: Dropped the fused off magic for now
       ]
      Signed-off-by: NMark Gross <mgross@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      7e5b3c26
  17. 21 2月, 2020 1 次提交
    • P
      x86/split_lock: Enable split lock detection by kernel · 6650cdd9
      Peter Zijlstra (Intel) 提交于
      A split-lock occurs when an atomic instruction operates on data that spans
      two cache lines. In order to maintain atomicity the core takes a global bus
      lock.
      
      This is typically >1000 cycles slower than an atomic operation within a
      cache line. It also disrupts performance on other cores (which must wait
      for the bus lock to be released before their memory operations can
      complete). For real-time systems this may mean missing deadlines. For other
      systems it may just be very annoying.
      
      Some CPUs have the capability to raise an #AC trap when a split lock is
      attempted.
      
      Provide a command line option to give the user choices on how to handle
      this:
      
      split_lock_detect=
      	off	- not enabled (no traps for split locks)
      	warn	- warn once when an application does a
      		  split lock, but allow it to continue
      		  running.
      	fatal	- Send SIGBUS to applications that cause split lock
      
      On systems that support split lock detection the default is "warn". Note
      that if the kernel hits a split lock in any mode other than "off" it will
      OOPs.
      
      One implementation wrinkle is that the MSR to control the split lock
      detection is per-core, not per thread. This might result in some short
      lived races on HT systems in "warn" mode if Linux tries to enable on one
      thread while disabling on the other. Race analysis by Sean Christopherson:
      
        - Toggling of split-lock is only done in "warn" mode.  Worst case
          scenario of a race is that a misbehaving task will generate multiple
          #AC exceptions on the same instruction.  And this race will only occur
          if both siblings are running tasks that generate split-lock #ACs, e.g.
          a race where sibling threads are writing different values will only
          occur if CPUx is disabling split-lock after an #AC and CPUy is
          re-enabling split-lock after *its* previous task generated an #AC.
        - Transitioning between off/warn/fatal modes at runtime isn't supported
          and disabling is tracked per task, so hardware will always reach a steady
          state that matches the configured mode.  I.e. split-lock is guaranteed to
          be enabled in hardware once all _TIF_SLD threads have been scheduled out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Co-developed-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
      6650cdd9
  18. 20 2月, 2020 1 次提交
    • K
      x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF · 21b5ee59
      Kim Phillips 提交于
      Commit
      
        aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
      		  performance counter")
      
      added support for access to the free-running counter via 'perf -e
      msr/irperf/', but when exercised, it always returns a 0 count:
      
      BEFORE:
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   624,833      instructions
                         0      msr/irperf/
      
      Simply set its enable bit - HWCR bit 30 - to make it start counting.
      
      Enablement is restricted to all machines advertising IRPERF capability,
      except those susceptible to an erratum that makes the IRPERF return
      bad values.
      
      That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
      models 20h and above [2].
      
      AFTER (on a family 17h model 31h machine):
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   621,690      instructions
                   622,490      msr/irperf/
      
      [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
      [2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors
      
      The revision guides are available from the bugzilla Link below.
      
       [ bp: Massage commit message. ]
      
      Fixes: aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
      Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
      21b5ee59
  19. 14 1月, 2020 1 次提交
    • S
      x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db
      Sean Christopherson 提交于
      As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
      are quite a mouthful, especially the VMX bits which must differentiate
      between enabling VMX inside and outside SMX (TXT) operation.  Rename the
      MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
      make them a little friendlier on the eyes.
      
      Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
      to match Intel's SDM, but a future patch will add a dedicated Kconfig,
      file and functions for the MSR. Using the full name for those assets is
      rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
      nomenclature is consistent throughout the kernel.
      
      Opportunistically, fix a few other annoyances with the defines:
      
        - Relocate the bit defines so that they immediately follow the MSR
          define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
        - Add whitespace around the block of feature control defines to make
          it clear they're all related.
        - Use BIT() instead of manually encoding the bit shift.
        - Use "VMX" instead of "VMXON" to match the SDM.
        - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
          be consistent with the kernel's verbiage used for all other feature
          control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
          likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
          the (literal) one-off usage of _ON, the SDM is simply "wrong".
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
      32ad73db
  20. 14 11月, 2019 1 次提交
  21. 04 11月, 2019 1 次提交
  22. 28 10月, 2019 2 次提交
    • P
      x86/speculation/taa: Add mitigation for TSX Async Abort · 1b42f017
      Pawan Gupta 提交于
      TSX Async Abort (TAA) is a side channel vulnerability to the internal
      buffers in some Intel processors similar to Microachitectural Data
      Sampling (MDS). In this case, certain loads may speculatively pass
      invalid data to dependent operations when an asynchronous abort
      condition is pending in a TSX transaction.
      
      This includes loads with no fault or assist condition. Such loads may
      speculatively expose stale data from the uarch data structures as in
      MDS. Scope of exposure is within the same-thread and cross-thread. This
      issue affects all current processors that support TSX, but do not have
      ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
      
      On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
      CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
      using VERW or L1D_FLUSH, there is no additional mitigation needed for
      TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
      disabling the Transactional Synchronization Extensions (TSX) feature.
      
      A new MSR IA32_TSX_CTRL in future and current processors after a
      microcode update can be used to control the TSX feature. There are two
      bits in that MSR:
      
      * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
      Transactional Memory (RTM).
      
      * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
      TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
      disabled with updated microcode but still enumerated as present by
      CPUID(EAX=7).EBX{bit4}.
      
      The second mitigation approach is similar to MDS which is clearing the
      affected CPU buffers on return to user space and when entering a guest.
      Relevant microcode update is required for the mitigation to work.  More
      details on this approach can be found here:
      
        https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
      
      The TSX feature can be controlled by the "tsx" command line parameter.
      If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
      deployed. The effective mitigation state can be read from sysfs.
      
       [ bp:
         - massage + comments cleanup
         - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
         - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
         - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
       ]
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      1b42f017
    • P
      x86/msr: Add the IA32_TSX_CTRL MSR · c2955f27
      Pawan Gupta 提交于
      Transactional Synchronization Extensions (TSX) may be used on certain
      processors as part of a speculative side channel attack.  A microcode
      update for existing processors that are vulnerable to this attack will
      add a new MSR - IA32_TSX_CTRL to allow the system administrator the
      option to disable TSX as one of the possible mitigations.
      
      The CPUs which get this new MSR after a microcode upgrade are the ones
      which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
      CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
      CPU buffers takes care of the TAA case as well.
      
        [ Note that future processors that are not vulnerable will also
          support the IA32_TSX_CTRL MSR. ]
      
      Add defines for the new IA32_TSX_CTRL MSR and its bits.
      
      TSX has two sub-features:
      
      1. Restricted Transactional Memory (RTM) is an explicitly-used feature
         where new instructions begin and end TSX transactions.
      2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
         "old" style locks are used by software.
      
      Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
      IA32_TSX_CTRL MSR.
      
      There are two control bits in IA32_TSX_CTRL MSR:
      
        Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
               sub-feature of TSX (will force all transactions to abort on the
      	 XBEGIN instruction).
      
        Bit 1: When set, it disables the enumeration of the RTM and HLE feature
               (i.e. it will make CPUID(EAX=7).EBX{bit4} and
      	  CPUID(EAX=7).EBX{bit11} read as 0).
      
      The other TSX sub-feature, Hardware Lock Elision (HLE), is
      unconditionally disabled by the new microcode but still enumerated
      as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
      IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Reviewed-by: NMark Gross <mgross@linux.intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      c2955f27
  23. 28 8月, 2019 1 次提交
  24. 20 8月, 2019 1 次提交
    • T
      x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h · c49a0a80
      Tom Lendacky 提交于
      There have been reports of RDRAND issues after resuming from suspend on
      some AMD family 15h and family 16h systems. This issue stems from a BIOS
      not performing the proper steps during resume to ensure RDRAND continues
      to function properly.
      
      RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
      reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
      support using CPUID, including the kernel, will believe that RDRAND is
      not supported.
      
      Update the CPU initialization to clear the RDRAND CPUID bit for any family
      15h and 16h processor that supports RDRAND. If it is known that the family
      15h or family 16h system does not have an RDRAND resume issue or that the
      system will not be placed in suspend, the "rdrand=force" kernel parameter
      can be used to stop the clearing of the RDRAND CPUID bit.
      
      Additionally, update the suspend and resume path to save and restore the
      MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
      place after resuming from suspend.
      
      Note, that clearing the RDRAND CPUID bit does not prevent a processor
      that normally supports the RDRAND instruction from executing it. So any
      code that determined the support based on family and model won't #UD.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chen Yu <yu.c.chen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
      Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
      c49a0a80
  25. 19 8月, 2019 1 次提交
  26. 24 6月, 2019 1 次提交
    • F
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu 提交于
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
  27. 01 5月, 2019 2 次提交
  28. 16 4月, 2019 1 次提交
    • K
      perf/x86/intel: Support adaptive PEBS v4 · c22497f5
      Kan Liang 提交于
      Adaptive PEBS is a new way to report PEBS sampling information. Instead
      of a fixed size record for all PEBS events it allows to configure the
      PEBS record to only include the information needed. Events can then opt
      in to use such an extended record, or stay with a basic record which
      only contains the IP.
      
      The major new feature is to support LBRs in PEBS record.
      Besides normal LBR, this allows (much faster) large PEBS, while still
      supporting callstacks through callstack LBR. So essentially a lot of
      profiling can now be done without frequent interrupts, dropping the
      overhead significantly.
      
      The main requirement still is to use a period, and not use frequency
      mode, because frequency mode requires reevaluating the frequency on each
      overflow.
      
      The floating point state (XMM) is also supported, which allows efficient
      profiling of FP function arguments.
      
      Introduce specific drain function to handle variable length records.
      Use a new callback to parse the new record format, and also handle the
      STATUS field now being at a different offset.
      
      Add code to set up the configuration register. Since there is only a
      single register, all events either get the full super set of all events,
      or only the basic record.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
      [ Renamed GPRS => GP. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c22497f5
  29. 07 3月, 2019 2 次提交
    • A
      x86/speculation/mds: Add basic bug infrastructure for MDS · ed5194c2
      Andi Kleen 提交于
      Microarchitectural Data Sampling (MDS), is a class of side channel attacks
      on internal buffers in Intel CPUs. The variants are:
      
       - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
       - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
       - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
      
      MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
      dependent load (store-to-load forwarding) as an optimization. The forward
      can also happen to a faulting or assisting load operation for a different
      memory address, which can be exploited under certain conditions. Store
      buffers are partitioned between Hyper-Threads so cross thread forwarding is
      not possible. But if a thread enters or exits a sleep state the store
      buffer is repartitioned which can expose data from one thread to the other.
      
      MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
      L1 miss situations and to hold data which is returned or sent in response
      to a memory or I/O operation. Fill buffers can forward data to a load
      operation and also write data to the cache. When the fill buffer is
      deallocated it can retain the stale data of the preceding operations which
      can then be forwarded to a faulting or assisting load operation, which can
      be exploited under certain conditions. Fill buffers are shared between
      Hyper-Threads so cross thread leakage is possible.
      
      MLDPS leaks Load Port Data. Load ports are used to perform load operations
      from memory or I/O. The received data is then forwarded to the register
      file or a subsequent operation. In some implementations the Load Port can
      contain stale data from a previous operation which can be forwarded to
      faulting or assisting loads under certain conditions, which again can be
      exploited eventually. Load ports are shared between Hyper-Threads so cross
      thread leakage is possible.
      
      All variants have the same mitigation for single CPU thread case (SMT off),
      so the kernel can treat them as one MDS issue.
      
      Add the basic infrastructure to detect if the current CPU is affected by
      MDS.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      ed5194c2
    • T
      x86/msr-index: Cleanup bit defines · d8eabc37
      Thomas Gleixner 提交于
      Greg pointed out that speculation related bit defines are using (1 << N)
      format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
      1UL at least.
      
      Clean it up.
      
      [ Josh Poimboeuf: Fix tools build ]
      Reported-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      d8eabc37
  30. 06 3月, 2019 1 次提交
  31. 21 12月, 2018 2 次提交
    • C
      KVM: x86: Add Intel PT virtualization work mode · f99e3daf
      Chao Peng 提交于
      Intel Processor Trace virtualization can be work in one
      of 2 possible modes:
      
      a. System-Wide mode (default):
         When the host configures Intel PT to collect trace packets
         of the entire system, it can leave the relevant VMX controls
         clear to allow VMX-specific packets to provide information
         across VMX transitions.
         KVM guest will not aware this feature in this mode and both
         host and KVM guest trace will output to host buffer.
      
      b. Host-Guest mode:
         Host can configure trace-packet generation while in
         VMX non-root operation for guests and root operation
         for native executing normally.
         Intel PT will be exposed to KVM guest in this mode, and
         the trace output to respective buffer of host and guest.
         In this mode, tht status of PT will be saved and disabled
         before VM-entry and restored after VM-exit if trace
         a virtual machine.
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f99e3daf
    • L
      perf/x86/intel/pt: Add new bit definitions for PT MSRs · 69843a91
      Luwei Kang 提交于
      Add bit definitions for Intel PT MSRs to support trace output
      directed to the memeory subsystem and holds a count if packet
      bytes that have been sent out.
      
      These are required by the upcoming PT support in KVM guests
      for MSRs read/write emulation.
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69843a91