1. 22 8月, 2023 2 次提交
  2. 07 12月, 2022 1 次提交
  3. 18 11月, 2022 1 次提交
  4. 11 11月, 2022 3 次提交
    • S
      x86/msr: Add PerfCntrGlobal* registers · f29be156
      Sandipan Das 提交于
      mainline inclusion
      from mainline-v5.19
      commit 089be16d
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5S3WV
      CVE: NA
      
      -------------------------------------------------
      
      Add MSR definitions that will be used to enable the new AMD
      Performance Monitoring Version 2 (PerfMonV2) features. These
      include:
      
        * Performance Counter Global Control (PerfCntrGlobalCtl)
        * Performance Counter Global Status (PerfCntrGlobalStatus)
        * Performance Counter Global Status Clear (PerfCntrGlobalStatusClr)
      
      The new Performance Counter Global Control and Status MSRs
      provide an interface for enabling or disabling multiple
      counters at the same time and for testing overflow without
      probing the individual registers for each PMC.
      
      The availability of these registers is indicated through the
      PerfMonV2 feature bit of CPUID leaf 0x80000022 EAX.
      Signed-off-by: NSandipan Das <sandipan.das@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/cdc0d8f75bd519848731b5c64d924f5a0619a573.1650515382.git.sandipan.das@amd.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>
      f29be156
    • X
      perf/x86/amd: Add AMD Fam19h Branch Sampling support · f760d3d7
      Xie Haocheng 提交于
      mainline inclusion
      from mainline-v5.19
      commit ada54345
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5S3WV
      CVE: NA
      
      -------------------------------------------------
      
      Add support for the AMD Fam19h 16-deep branch sampling feature as
      described in the AMD PPR Fam19h Model 01h Revision B1.  This is a model
      specific extension. It is not an architected AMD feature.
      
      The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR
      registers. There is no branch type filtering. All control flow changes are
      captured. BRS relies on specific programming of the core PMU of Fam19h.  In
      particular, the following requirements must be met:
       - the sampling period be greater than 16 (BRS depth)
       - the sampling period must use a fixed and not frequency mode
      
      BRS interacts with the NMI interrupt as well. Because enabling BRS is
      expensive, it is only activated after P event occurrences, where P is the
      desired sampling period.  At P occurrences of the event, the counter
      overflows, the CPU catches the interrupt, activates BRS for 16 branches until
      it saturates, and then delivers the NMI to the kernel.  Between the overflow
      and the time BRS activates more branches may be executed skewing the period.
      All along, the sampling event keeps counting. The skid may be attenuated by
      reducing the sampling period by 16 (subsequent patch).
      
      BRS is integrated into perf_events seamlessly via the same
      PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry
      records in the sampling buffer. No prediction information is supported. The
      branches are stored in reverse order of execution.  The most recent branch is
      the first entry in each record.
      
      No modification to the perf tool is necessary.
      
      BRS can be used with any sampling event. However, it is recommended to use
      the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS
      captures.
      
      $ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test
      
      $ perf report -D
      56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
      ... branch stack: nr:16
      .....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
      .....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
      .....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
      .....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
      .....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
      .....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
      .....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
      .....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
      .....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
      .....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
      ..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
      ..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
      ..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
      ..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
      ..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
      ..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
       ... thread: test:18230
       ...... dso: test
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>
      f760d3d7
    • T
      x86/cpu: Add VM page flush MSR availablility as a CPUID feature · 66cccc91
      Tom Lendacky 提交于
      mainline inclusion
      from mainline-v5.11
      commit 69372cf0
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5S3WV
      CVE: NA
      
      -------------------------------------------------
      
      On systems that do not have hardware enforced cache coherency between
      encrypted and unencrypted mappings of the same physical page, the
      hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
      contents of an SEV guest page. When a small number of pages are being
      flushed, this can be used in place of issuing a WBINVD across all CPUs.
      
      CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is
      available. Add a CPUID feature to indicate it is supported and define the
      MSR.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>
      66cccc91
  5. 31 10月, 2022 1 次提交
  6. 08 10月, 2022 1 次提交
  7. 21 9月, 2022 1 次提交
    • D
      x86/speculation: Add RSB VM Exit protections · 30b1e4cb
      Daniel Sneddon 提交于
      stable inclusion
      from stable-v5.10.136
      commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO
      CVE: CVE-2022-26373
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      
      --------------------------------
      
      commit 2b129932 upstream.
      
      tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
      documented for RET instructions after VM exits. Mitigate it with a new
      one-entry RSB stuffing mechanism and a new LFENCE.
      
      == Background ==
      
      Indirect Branch Restricted Speculation (IBRS) was designed to help
      mitigate Branch Target Injection and Speculative Store Bypass, i.e.
      Spectre, attacks. IBRS prevents software run in less privileged modes
      from affecting branch prediction in more privileged modes. IBRS requires
      the MSR to be written on every privilege level change.
      
      To overcome some of the performance issues of IBRS, Enhanced IBRS was
      introduced.  eIBRS is an "always on" IBRS, in other words, just turn
      it on once instead of writing the MSR on every privilege level change.
      When eIBRS is enabled, more privileged modes should be protected from
      less privileged modes, including protecting VMMs from guests.
      
      == Problem ==
      
      Here's a simplification of how guests are run on Linux' KVM:
      
      void run_kvm_guest(void)
      {
      	// Prepare to run guest
      	VMRESUME();
      	// Clean up after guest runs
      }
      
      The execution flow for that would look something like this to the
      processor:
      
      1. Host-side: call run_kvm_guest()
      2. Host-side: VMRESUME
      3. Guest runs, does "CALL guest_function"
      4. VM exit, host runs again
      5. Host might make some "cleanup" function calls
      6. Host-side: RET from run_kvm_guest()
      
      Now, when back on the host, there are a couple of possible scenarios of
      post-guest activity the host needs to do before executing host code:
      
      * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
      touched and Linux has to do a 32-entry stuffing.
      
      * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
      IBRS=1 shortly after VM exit, has a documented side effect of flushing
      the RSB except in this PBRSB situation where the software needs to stuff
      the last RSB entry "by hand".
      
      IOW, with eIBRS supported, host RET instructions should no longer be
      influenced by guest behavior after the host retires a single CALL
      instruction.
      
      However, if the RET instructions are "unbalanced" with CALLs after a VM
      exit as is the RET in #6, it might speculatively use the address for the
      instruction after the CALL in #3 as an RSB prediction. This is a problem
      since the (untrusted) guest controls this address.
      
      Balanced CALL/RET instruction pairs such as in step #5 are not affected.
      
      == Solution ==
      
      The PBRSB issue affects a wide variety of Intel processors which
      support eIBRS. But not all of them need mitigation. Today,
      X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
      PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
      eIBRS systems which enable legacy IBRS explicitly.
      
      However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
      and most of them need a new mitigation.
      
      Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
      which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
      
      The lighter-weight mitigation performs a CALL instruction which is
      immediately followed by a speculative execution barrier (INT3). This
      steers speculative execution to the barrier -- just like a retpoline
      -- which ensures that speculation can never reach an unbalanced RET.
      Then, ensure this CALL is retired before continuing execution with an
      LFENCE.
      
      In other words, the window of exposure is opened at VM exit where RET
      behavior is troublesome. While the window is open, force RSB predictions
      sampling for RET targets to a dead end at the INT3. Close the window
      with the LFENCE.
      
      There is a subset of eIBRS systems which are not vulnerable to PBRSB.
      Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
      Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
      
        [ bp: Massage, incorporate review comments from Andy Cooper. ]
      Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      conflict:
          arch/x86/include/asm/cpufeatures.h
      Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Reviewed-by: NLiao Chang <liaochang1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      30b1e4cb
  8. 20 9月, 2022 3 次提交
  9. 29 7月, 2022 1 次提交
    • F
      x86/traps: Handle #DB for bus lock · c8db5890
      Fenghua Yu 提交于
      mainline inclusion
      from mainline-v5.12-rc4
      commit ebb1064e
      category: feature
      bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C
      CVE: NA
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
      commit/?id=ebb1064e
      
      Intel-SIG: commit ebb1064e x86/traps: Handle #DB for bus lock
      
      --------------------------------
      
      Bus locks degrade performance for the whole system, not just for the CPU
      that requested the bus lock. Two CPU features "#AC for split lock" and
      "#DB for bus lock" provide hooks so that the operating system may choose
      one of several mitigation strategies.
      
      bus lock feature to cover additional situations with new options to
      mitigate.
      
      split_lock_detect=
      		#AC for split lock		#DB for bus lock
      
      off		Do nothing			Do nothing
      
      warn		Kernel OOPs			Warn once per task and
      		Warn once per task and		and continues to run.
      		disable future checking
      	 	When both features are
      		supported, warn in #AC
      
      fatal		Kernel OOPs			Send SIGBUS to user.
      		Send SIGBUS to user
      		When both features are
      		supported, fatal in #AC
      
      ratelimit:N	Do nothing			Limit bus lock rate to
      						N per second in the
      						current non-root user.
      
      Default option is "warn".
      
      Hardware only generates #DB for bus lock detect when CPL>0 to avoid
      nested #DB from multiple bus locks while the first #DB is being handled.
      So no need to handle #DB for bus lock detected in the kernel.
      
      while #AC for split lock is enabled by split lock detection bit 29 in
      TEST_CTRL MSR.
      
      Both breakpoint and bus lock in the same instruction can trigger one #DB.
      The bus lock is handled before the breakpoint in the #DB handler.
      
      Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
      the #DB handler right after reading DR6.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
      
      (cherry picked from commit ebb1064e)
      Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
      c8db5890
  10. 08 7月, 2022 1 次提交
  11. 06 7月, 2022 2 次提交
  12. 22 2月, 2022 2 次提交
  13. 29 12月, 2021 2 次提交
  14. 18 9月, 2020 1 次提交
  15. 10 9月, 2020 1 次提交
  16. 08 9月, 2020 2 次提交
  17. 18 8月, 2020 2 次提交
    • K
      perf/x86/intel: Support TopDown metrics on Ice Lake · 59a854e2
      Kan Liang 提交于
      Ice Lake supports the hardware TopDown metrics feature, which can free
      up the scarce GP counters.
      
      Update the event constraints for the metrics events. The metric counters
      do not exist, which are mapped to a dummy offset. The sharing between
      multiple users of the same metric without multiplexing is not allowed.
      
      Implement set_topdown_event_period for Ice Lake. The values in
      PERF_METRICS MSR are derived from the fixed counter 3. Both registers
      should start from zero.
      
      Implement update_topdown_event for Ice Lake. The metric is reported by
      multiplying the metric (fraction) with slots. To maintain accurate
      measurements, both registers are cleared for each update. The fixed
      counter 3 should always be cleared before the PERF_METRICS.
      
      Implement td_attr for the new metrics events and the new slots fixed
      counter. Make them visible to the perf user tools.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
      59a854e2
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
  18. 08 7月, 2020 1 次提交
  19. 02 7月, 2020 1 次提交
    • S
      cpufreq: intel_pstate: Allow enable/disable energy efficiency · ed7bde7a
      Srinivas Pandruvada 提交于
      By default intel_pstate the driver disables energy efficiency by setting
      MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
      This CPU model is also shared by Coffee Lake desktop CPUs. This allows
      these systems to reach maximum possible frequency. But this adds power
      penalty, which some customers don't want. They want some way to enable/
      disable dynamically.
      
      So, add an additional attribute "energy_efficiency" under
      /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
      to read and write bit 19 ("Disable Energy Efficiency Optimization") in
      the MSR IA32_POWER_CTL.
      
      This attribute is present in both HWP and non-HWP mode as this has an
      effect in both modes. Refer to Intel Software Developer's manual for
      details.
      
      The scope of this bit is package wide. Also these systems are single
      package systems. So read/write MSR on the current CPU is enough.
      
      The energy efficiency (EE) bit setting needs to be preserved during
      suspend/resume and CPU offline/online operation. To do this:
      - Restoring the EE setting from the cpufreq resume() callback, if there
      is change from the system default.
      - By default, don't disable EE from cpufreq init() callback for matching
      CPU models. Since the scope is package wide and is a single package
      system, move the disable EE calls from init() callback to
      intel_pstate_init() function, which is called only once.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ed7bde7a
  20. 22 6月, 2020 1 次提交
    • B
      x86/msr: Move the F15h MSRs where they belong · 99e40204
      Borislav Petkov 提交于
      1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      
      moved the three F15h power MSRs to the architectural list but that was
      wrong as they belong in the family 0x15 list. That also caused:
      
        In file included from trace/beauty/tracepoints/x86_msr.c:10:
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
          292 |  [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
              |                                             ^~~~~~~~~~~
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')
      
      due to MSR_F15H_PTSC ending up being defined twice. Move them where they
      belong and drop the duplicate.
      
      Also, drop the respective tools/ changes of the msr-index.h copy the
      above commit added because perf tool developers prefer to go through
      those changes themselves in order to figure out whether changes to the
      kernel headers would need additional handling in perf.
      
      Fixes: 1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
      99e40204
  21. 16 6月, 2020 1 次提交
  22. 28 5月, 2020 1 次提交
  23. 20 4月, 2020 1 次提交
    • M
      x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation · 7e5b3c26
      Mark Gross 提交于
      SRBDS is an MDS-like speculative side channel that can leak bits from the
      random number generator (RNG) across cores and threads. New microcode
      serializes the processor access during the execution of RDRAND and
      RDSEED. This ensures that the shared buffer is overwritten before it is
      released for reuse.
      
      While it is present on all affected CPU models, the microcode mitigation
      is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
      cases where TSX is not supported or has been disabled with TSX_CTRL.
      
      The mitigation is activated by default on affected processors and it
      increases latency for RDRAND and RDSEED instructions. Among other
      effects this will reduce throughput from /dev/urandom.
      
      * Enable administrator to configure the mitigation off when desired using
        either mitigations=off or srbds=off.
      
      * Export vulnerability status via sysfs
      
      * Rename file-scoped macros to apply for non-whitelist table initializations.
      
       [ bp: Massage,
         - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
         - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
         - flip check in cpu_set_bug_bits() to save an indentation level,
         - reflow comments.
         jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
         tglx: Dropped the fused off magic for now
       ]
      Signed-off-by: NMark Gross <mgross@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      7e5b3c26
  24. 21 2月, 2020 1 次提交
    • P
      x86/split_lock: Enable split lock detection by kernel · 6650cdd9
      Peter Zijlstra (Intel) 提交于
      A split-lock occurs when an atomic instruction operates on data that spans
      two cache lines. In order to maintain atomicity the core takes a global bus
      lock.
      
      This is typically >1000 cycles slower than an atomic operation within a
      cache line. It also disrupts performance on other cores (which must wait
      for the bus lock to be released before their memory operations can
      complete). For real-time systems this may mean missing deadlines. For other
      systems it may just be very annoying.
      
      Some CPUs have the capability to raise an #AC trap when a split lock is
      attempted.
      
      Provide a command line option to give the user choices on how to handle
      this:
      
      split_lock_detect=
      	off	- not enabled (no traps for split locks)
      	warn	- warn once when an application does a
      		  split lock, but allow it to continue
      		  running.
      	fatal	- Send SIGBUS to applications that cause split lock
      
      On systems that support split lock detection the default is "warn". Note
      that if the kernel hits a split lock in any mode other than "off" it will
      OOPs.
      
      One implementation wrinkle is that the MSR to control the split lock
      detection is per-core, not per thread. This might result in some short
      lived races on HT systems in "warn" mode if Linux tries to enable on one
      thread while disabling on the other. Race analysis by Sean Christopherson:
      
        - Toggling of split-lock is only done in "warn" mode.  Worst case
          scenario of a race is that a misbehaving task will generate multiple
          #AC exceptions on the same instruction.  And this race will only occur
          if both siblings are running tasks that generate split-lock #ACs, e.g.
          a race where sibling threads are writing different values will only
          occur if CPUx is disabling split-lock after an #AC and CPUy is
          re-enabling split-lock after *its* previous task generated an #AC.
        - Transitioning between off/warn/fatal modes at runtime isn't supported
          and disabling is tracked per task, so hardware will always reach a steady
          state that matches the configured mode.  I.e. split-lock is guaranteed to
          be enabled in hardware once all _TIF_SLD threads have been scheduled out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Co-developed-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
      6650cdd9
  25. 20 2月, 2020 1 次提交
    • K
      x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF · 21b5ee59
      Kim Phillips 提交于
      Commit
      
        aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
      		  performance counter")
      
      added support for access to the free-running counter via 'perf -e
      msr/irperf/', but when exercised, it always returns a 0 count:
      
      BEFORE:
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   624,833      instructions
                         0      msr/irperf/
      
      Simply set its enable bit - HWCR bit 30 - to make it start counting.
      
      Enablement is restricted to all machines advertising IRPERF capability,
      except those susceptible to an erratum that makes the IRPERF return
      bad values.
      
      That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
      models 20h and above [2].
      
      AFTER (on a family 17h model 31h machine):
      
        $ perf stat -e instructions,msr/irperf/ true
      
         Performance counter stats for 'true':
      
                   621,690      instructions
                   622,490      msr/irperf/
      
      [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
      [2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors
      
      The revision guides are available from the bugzilla Link below.
      
       [ bp: Massage commit message. ]
      
      Fixes: aaf24884 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
      Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
      21b5ee59
  26. 14 1月, 2020 1 次提交
    • S
      x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db
      Sean Christopherson 提交于
      As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
      are quite a mouthful, especially the VMX bits which must differentiate
      between enabling VMX inside and outside SMX (TXT) operation.  Rename the
      MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
      make them a little friendlier on the eyes.
      
      Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
      to match Intel's SDM, but a future patch will add a dedicated Kconfig,
      file and functions for the MSR. Using the full name for those assets is
      rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
      nomenclature is consistent throughout the kernel.
      
      Opportunistically, fix a few other annoyances with the defines:
      
        - Relocate the bit defines so that they immediately follow the MSR
          define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
        - Add whitespace around the block of feature control defines to make
          it clear they're all related.
        - Use BIT() instead of manually encoding the bit shift.
        - Use "VMX" instead of "VMXON" to match the SDM.
        - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
          be consistent with the kernel's verbiage used for all other feature
          control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
          likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
          the (literal) one-off usage of _ON, the SDM is simply "wrong".
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
      32ad73db
  27. 14 11月, 2019 1 次提交
  28. 04 11月, 2019 1 次提交
  29. 28 10月, 2019 2 次提交
    • P
      x86/speculation/taa: Add mitigation for TSX Async Abort · 1b42f017
      Pawan Gupta 提交于
      TSX Async Abort (TAA) is a side channel vulnerability to the internal
      buffers in some Intel processors similar to Microachitectural Data
      Sampling (MDS). In this case, certain loads may speculatively pass
      invalid data to dependent operations when an asynchronous abort
      condition is pending in a TSX transaction.
      
      This includes loads with no fault or assist condition. Such loads may
      speculatively expose stale data from the uarch data structures as in
      MDS. Scope of exposure is within the same-thread and cross-thread. This
      issue affects all current processors that support TSX, but do not have
      ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
      
      On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
      CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
      using VERW or L1D_FLUSH, there is no additional mitigation needed for
      TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
      disabling the Transactional Synchronization Extensions (TSX) feature.
      
      A new MSR IA32_TSX_CTRL in future and current processors after a
      microcode update can be used to control the TSX feature. There are two
      bits in that MSR:
      
      * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
      Transactional Memory (RTM).
      
      * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
      TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
      disabled with updated microcode but still enumerated as present by
      CPUID(EAX=7).EBX{bit4}.
      
      The second mitigation approach is similar to MDS which is clearing the
      affected CPU buffers on return to user space and when entering a guest.
      Relevant microcode update is required for the mitigation to work.  More
      details on this approach can be found here:
      
        https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
      
      The TSX feature can be controlled by the "tsx" command line parameter.
      If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
      deployed. The effective mitigation state can be read from sysfs.
      
       [ bp:
         - massage + comments cleanup
         - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
         - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
         - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
       ]
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      1b42f017
    • P
      x86/msr: Add the IA32_TSX_CTRL MSR · c2955f27
      Pawan Gupta 提交于
      Transactional Synchronization Extensions (TSX) may be used on certain
      processors as part of a speculative side channel attack.  A microcode
      update for existing processors that are vulnerable to this attack will
      add a new MSR - IA32_TSX_CTRL to allow the system administrator the
      option to disable TSX as one of the possible mitigations.
      
      The CPUs which get this new MSR after a microcode upgrade are the ones
      which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
      CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
      CPU buffers takes care of the TAA case as well.
      
        [ Note that future processors that are not vulnerable will also
          support the IA32_TSX_CTRL MSR. ]
      
      Add defines for the new IA32_TSX_CTRL MSR and its bits.
      
      TSX has two sub-features:
      
      1. Restricted Transactional Memory (RTM) is an explicitly-used feature
         where new instructions begin and end TSX transactions.
      2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
         "old" style locks are used by software.
      
      Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
      IA32_TSX_CTRL MSR.
      
      There are two control bits in IA32_TSX_CTRL MSR:
      
        Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
               sub-feature of TSX (will force all transactions to abort on the
      	 XBEGIN instruction).
      
        Bit 1: When set, it disables the enumeration of the RTM and HLE feature
               (i.e. it will make CPUID(EAX=7).EBX{bit4} and
      	  CPUID(EAX=7).EBX{bit11} read as 0).
      
      The other TSX sub-feature, Hardware Lock Elision (HLE), is
      unconditionally disabled by the new microcode but still enumerated
      as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
      IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Reviewed-by: NMark Gross <mgross@linux.intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      c2955f27