1. 22 8月, 2023 2 次提交
  2. 20 9月, 2022 1 次提交
    • D
      x86/speculation: Add RSB VM Exit protections · 3838336f
      Daniel Sneddon 提交于
      stable inclusion
      from stable-v5.10.136
      commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO
      CVE: CVE-2022-26373
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      
      --------------------------------
      
      commit 2b129932 upstream.
      
      tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
      documented for RET instructions after VM exits. Mitigate it with a new
      one-entry RSB stuffing mechanism and a new LFENCE.
      
      == Background ==
      
      Indirect Branch Restricted Speculation (IBRS) was designed to help
      mitigate Branch Target Injection and Speculative Store Bypass, i.e.
      Spectre, attacks. IBRS prevents software run in less privileged modes
      from affecting branch prediction in more privileged modes. IBRS requires
      the MSR to be written on every privilege level change.
      
      To overcome some of the performance issues of IBRS, Enhanced IBRS was
      introduced.  eIBRS is an "always on" IBRS, in other words, just turn
      it on once instead of writing the MSR on every privilege level change.
      When eIBRS is enabled, more privileged modes should be protected from
      less privileged modes, including protecting VMMs from guests.
      
      == Problem ==
      
      Here's a simplification of how guests are run on Linux' KVM:
      
      void run_kvm_guest(void)
      {
      	// Prepare to run guest
      	VMRESUME();
      	// Clean up after guest runs
      }
      
      The execution flow for that would look something like this to the
      processor:
      
      1. Host-side: call run_kvm_guest()
      2. Host-side: VMRESUME
      3. Guest runs, does "CALL guest_function"
      4. VM exit, host runs again
      5. Host might make some "cleanup" function calls
      6. Host-side: RET from run_kvm_guest()
      
      Now, when back on the host, there are a couple of possible scenarios of
      post-guest activity the host needs to do before executing host code:
      
      * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
      touched and Linux has to do a 32-entry stuffing.
      
      * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
      IBRS=1 shortly after VM exit, has a documented side effect of flushing
      the RSB except in this PBRSB situation where the software needs to stuff
      the last RSB entry "by hand".
      
      IOW, with eIBRS supported, host RET instructions should no longer be
      influenced by guest behavior after the host retires a single CALL
      instruction.
      
      However, if the RET instructions are "unbalanced" with CALLs after a VM
      exit as is the RET in #6, it might speculatively use the address for the
      instruction after the CALL in #3 as an RSB prediction. This is a problem
      since the (untrusted) guest controls this address.
      
      Balanced CALL/RET instruction pairs such as in step #5 are not affected.
      
      == Solution ==
      
      The PBRSB issue affects a wide variety of Intel processors which
      support eIBRS. But not all of them need mitigation. Today,
      X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
      PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
      eIBRS systems which enable legacy IBRS explicitly.
      
      However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
      and most of them need a new mitigation.
      
      Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
      which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
      
      The lighter-weight mitigation performs a CALL instruction which is
      immediately followed by a speculative execution barrier (INT3). This
      steers speculative execution to the barrier -- just like a retpoline
      -- which ensures that speculation can never reach an unbalanced RET.
      Then, ensure this CALL is retired before continuing execution with an
      LFENCE.
      
      In other words, the window of exposure is opened at VM exit where RET
      behavior is troublesome. While the window is open, force RSB predictions
      sampling for RET targets to a dead end at the INT3. Close the window
      with the LFENCE.
      
      There is a subset of eIBRS systems which are not vulnerable to PBRSB.
      Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
      Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
      
        [ bp: Massage, incorporate review comments from Andy Cooper. ]
      Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      conflict:
          arch/x86/include/asm/cpufeatures.h
      Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3838336f
  3. 16 9月, 2022 2 次提交
  4. 05 7月, 2022 2 次提交
  5. 29 12月, 2021 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · b9c76b02
      Arnaldo Carvalho de Melo 提交于
      mainline inclusion
      from mainline-5.16-rc6
      commit e9bde94f
      category: feature
      feature: milan cpu
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
      CVE: NA
      
      --------------------------------
      
      To pick up the changes in:
      
        d205e0f1 ("x86/{cpufeatures,msr}: Add Intel SGX Launch Control hardware bits")
        e7b6385b ("x86/cpufeatures: Add Intel SGX hardware bits")
        43756a29 ("powercap: Add AMD Fam17h RAPL support")
        298ed2b3 ("x86/msr-index: sort AMD RAPL MSRs by address")
        68299a42 ("x86/mce: Enable additional error logging on certain Intel CPUs")
      
      That cause these changes in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-12-17 14:45:49.036994450 -0300
        +++ after	2020-12-17 14:46:01.654256639 -0300
        @@ -22,6 +22,10 @@
         	[0x00000060] = "LBR_CORE_TO",
         	[0x00000079] = "IA32_UCODE_WRITE",
         	[0x0000008b] = "IA32_UCODE_REV",
        +	[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
        +	[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
        +	[0x0000008E] = "IA32_SGXLEPUBKEYHASH2",
        +	[0x0000008F] = "IA32_SGXLEPUBKEYHASH3",
         	[0x0000009b] = "IA32_SMM_MONITOR_CTL",
         	[0x0000009e] = "IA32_SMBASE",
         	[0x000000c1] = "IA32_PERFCTR0",
        @@ -59,6 +63,7 @@
         	[0x00000179] = "IA32_MCG_CAP",
         	[0x0000017a] = "IA32_MCG_STATUS",
         	[0x0000017b] = "IA32_MCG_CTL",
        +	[0x0000017f] = "ERROR_CONTROL",
         	[0x00000180] = "IA32_MCG_EAX",
         	[0x00000181] = "IA32_MCG_EBX",
         	[0x00000182] = "IA32_MCG_ECX",
        @@ -294,6 +299,7 @@
         	[0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
         	[0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
         	[0xc0010299 - x86_AMD_V_KVM_MSRs_offset] = "AMD_RAPL_POWER_UNIT",
        +	[0xc001029a - x86_AMD_V_KVM_MSRs_offset] = "AMD_CORE_ENERGY_STATUS",
         	[0xc001029b - x86_AMD_V_KVM_MSRs_offset] = "AMD_PKG_ENERGY_STATUS",
         	[0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
         	[0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
        $
      
      Which causes these parts of tools/perf/ to be rebuilt:
      
        CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
        LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
        LD       /tmp/build/perf/trace/beauty/perf-in.o
        LD       /tmp/build/perf/perf-in.o
        LINK     /tmp/build/perf/perf
      
      At some point these should just be tables read by perf on demand.
      
      This allows 'perf trace' users to use those strings to translate from
      the msr ids provided by the msr: tracepoints.
      
      This addresses this perf tools build warning:
      
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Victor Ding <victording@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      
      conflicts:
      following patches haven't been backported:
        d205e0f1 ("x86/{cpufeatures,msr}: Add Intel SGX Launch Control
      hardware bits")
        e7b6385b ("x86/cpufeatures: Add Intel SGX hardware bits")
        68299a42 ("x86/mce: Enable additional error logging on certain
      Intel CPUs")
      so fixing code related to above patches in this patch is not applied.
      Signed-off-by: Nqinyu <qinyu16@huawei.com>
      Reviewed-by: NChao Liu <liuchao173@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b9c76b02
  6. 03 11月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 32b734e0
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        29dcc60f ("x86/boot/compressed/64: Add stage1 #VC handler")
        36e1be8a ("perf/x86/amd/ibs: Fix raw sample data accumulation")
        59a854e2 ("perf/x86/intel: Support TopDown metrics on Ice Lake")
        7b2c05a1 ("perf/x86/intel: Generic support for hardware TopDown metrics")
        99e40204 ("x86/msr: Move the F15h MSRs where they belong")
        b57de6cd ("x86/sev-es: Add SEV-ES Feature Detection")
        ed7bde7a ("cpufreq: intel_pstate: Allow enable/disable energy efficiency")
        f0f2f9fe ("x86/msr-index: Define an IA32_PASID MSR")
      
      That cause these changes in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-10-19 13:27:33.195274425 -0300
        +++ after	2020-10-19 13:27:44.144507610 -0300
        @@ -113,6 +113,8 @@
         	[0x00000309] = "CORE_PERF_FIXED_CTR0",
         	[0x0000030a] = "CORE_PERF_FIXED_CTR1",
         	[0x0000030b] = "CORE_PERF_FIXED_CTR2",
        +	[0x0000030c] = "CORE_PERF_FIXED_CTR3",
        +	[0x00000329] = "PERF_METRICS",
         	[0x00000345] = "IA32_PERF_CAPABILITIES",
         	[0x0000038d] = "CORE_PERF_FIXED_CTR_CTRL",
         	[0x0000038e] = "CORE_PERF_GLOBAL_STATUS",
        @@ -222,6 +224,7 @@
         	[0x00000774] = "HWP_REQUEST",
         	[0x00000777] = "HWP_STATUS",
         	[0x00000d90] = "IA32_BNDCFGS",
        +	[0x00000d93] = "IA32_PASID",
         	[0x00000da0] = "IA32_XSS",
         	[0x00000dc0] = "LBR_INFO_0",
         	[0x00000ffc] = "IA32_BNDCFGS_RSVD",
        @@ -279,6 +282,7 @@
         	[0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
         	[0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
         	[0xc001011f - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VIRT_SPEC_CTRL",
        +	[0xc0010130 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV_ES_GHCB",
         	[0xc0010131 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV",
         	[0xc0010140 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_ID_LENGTH",
         	[0xc0010141 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_OSVW_STATUS",
        $
      
      Which causes these parts of tools/perf/ to be rebuilt:
      
        CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
        DESCEND  plugins
        GEN      /tmp/build/perf/python/perf.so
        INSTALL  trace_plugins
        LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
        LD       /tmp/build/perf/trace/beauty/perf-in.o
        LD       /tmp/build/perf/perf-in.o
        LINK     /tmp/build/perf/per
      
      At some point these should just be tables read by perf on demand.
      
      This addresses this perf tools build warning:
      
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32b734e0
  7. 07 8月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · f815fe51
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        d6a162a4 x86/msr-index: Add bunch of MSRs for Arch LBR
        ed7bde7a cpufreq: intel_pstate: Allow enable/disable energy efficiency
        99e40204 (tip/x86/cleanups) x86/msr: Move the F15h MSRs where they belong
        1068ed45 x86/msr: Lift AMD family 0x15 power-specific MSRs
        5cde2653 (tag: perf-core-2020-06-01) perf/x86/rapl: Add AMD Fam17h RAPL support
      
      Addressing these tools/perf build warnings:
      
      That makes the beautification scripts to pick some new entries:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-08-07 08:45:18.801298854 -0300
        +++ after	2020-08-07 08:45:28.654456422 -0300
        @@ -271,6 +271,8 @@
         	[0xc0010062 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PERF_CTL",
         	[0xc0010063 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PERF_STATUS",
         	[0xc0010064 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PSTATE_DEF_BASE",
        +	[0xc001007a - x86_AMD_V_KVM_MSRs_offset] = "F15H_CU_PWR_ACCUMULATOR",
        +	[0xc001007b - x86_AMD_V_KVM_MSRs_offset] = "F15H_CU_MAX_PWR_ACCUMULATOR",
         	[0xc0010112 - x86_AMD_V_KVM_MSRs_offset] = "K8_TSEG_ADDR",
         	[0xc0010113 - x86_AMD_V_KVM_MSRs_offset] = "K8_TSEG_MASK",
         	[0xc0010114 - x86_AMD_V_KVM_MSRs_offset] = "VM_CR",
        $
      
      And this gets rebuilt:
      
        CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
        INSTALL  trace_plugins
        LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
        LD       /tmp/build/perf/trace/beauty/perf-in.o
        LD       /tmp/build/perf/perf-in.o
        LINK     /tmp/build/perf/perf
      
      Now one can trace systemwide asking to see backtraces to where those
      MSRs are being read/written with:
      
        # perf trace -e msr:*_msr/max-stack=32/ --filter="msr==F15H_CU_PWR_ACCUMULATOR || msr==F15H_CU_MAX_PWR_ACCUMULATOR"
        ^C#
        #
      
      If we use -v (verbose mode) we can see what it does behind the scenes:
      
        # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==F15H_CU_PWR_ACCUMULATOR || msr==F15H_CU_MAX_PWR_ACCUMULATOR"
        Using CPUID GenuineIntel-6-8E-A
        0xc001007a
        0xc001007b
        New filter for msr:read_msr: (msr==0xc001007a || msr==0xc001007b) && (common_pid != 2448054 && common_pid != 2782)
        0xc001007a
        0xc001007b
        New filter for msr:write_msr: (msr==0xc001007a || msr==0xc001007b) && (common_pid != 2448054 && common_pid != 2782)
        mmap size 528384B
        ^C#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f815fe51
  8. 22 6月, 2020 1 次提交
    • B
      x86/msr: Move the F15h MSRs where they belong · 99e40204
      Borislav Petkov 提交于
      1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      
      moved the three F15h power MSRs to the architectural list but that was
      wrong as they belong in the family 0x15 list. That also caused:
      
        In file included from trace/beauty/tracepoints/x86_msr.c:10:
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
          292 |  [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
              |                                             ^~~~~~~~~~~
        perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')
      
      due to MSR_F15H_PTSC ending up being defined twice. Move them where they
      belong and drop the duplicate.
      
      Also, drop the respective tools/ changes of the msr-index.h copy the
      above commit added because perf tool developers prefer to go through
      those changes themselves in order to figure out whether changes to the
      kernel headers would need additional handling in perf.
      
      Fixes: 1068ed45 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
      99e40204
  9. 18 6月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 25ca7e5c
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        7e5b3c26 ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation")
      
      Addressing these tools/perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      With this one will be able to use these new AMD MSRs in filters, by
      name, e.g.:
      
        # perf trace -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
        ^C#
      
      Using -v we can see how it sets up the tracepoint filters, converting
      from the string in the filter to the numeric value:
      
        # perf trace -v -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
        Using CPUID GenuineIntel-6-8E-A
        0x123
        New filter for msr:read_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
        0x123
        New filter for msr:write_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
        0x123
        New filter for msr:rdpmc: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
        mmap size 528384B
        ^C#
      
      The updating process shows how this affects tooling in more detail:
      
        $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        --- tools/arch/x86/include/asm/msr-index.h	2020-06-03 10:36:09.959910238 -0300
        +++ arch/x86/include/asm/msr-index.h	2020-06-17 10:04:20.235052901 -0300
        @@ -128,6 +128,10 @@
         #define TSX_CTRL_RTM_DISABLE		BIT(0)	/* Disable RTM feature */
         #define TSX_CTRL_CPUID_CLEAR		BIT(1)	/* Disable TSX enumeration */
      
        +/* SRBDS support */
        +#define MSR_IA32_MCU_OPT_CTRL		0x00000123
        +#define RNGDS_MITG_DIS			BIT(0)
        +
         #define MSR_IA32_SYSENTER_CS		0x00000174
         #define MSR_IA32_SYSENTER_ESP		0x00000175
         #define MSR_IA32_SYSENTER_EIP		0x00000176
        $ set -o vi
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-06-17 10:05:49.653114752 -0300
        +++ after	2020-06-17 10:06:01.777258731 -0300
        @@ -51,6 +51,7 @@
         	[0x0000011e] = "IA32_BBL_CR_CTL3",
         	[0x00000120] = "IDT_MCR_CTRL",
         	[0x00000122] = "IA32_TSX_CTRL",
        +	[0x00000123] = "IA32_MCU_OPT_CTRL",
         	[0x00000140] = "MISC_FEATURES_ENABLES",
         	[0x00000174] = "IA32_SYSENTER_CS",
         	[0x00000175] = "IA32_SYSENTER_ESP",
        $
      
      The related change to cpu-features.h affects this:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      This shouldn't be affecting that 'perf bench' entry:
      
        $ find tools/perf/ -type f | xargs grep SRBDS
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Gross <mgross@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      25ca7e5c
  10. 16 6月, 2020 1 次提交
  11. 02 6月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 3b1f47d6
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        5cde2653 ("perf/x86/rapl: Add AMD Fam17h RAPL support")
      
      Addressing this tools/perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      With this one will be able to use these new AMD MSRs in filters, by
      name, e.g.:
      
         # perf trace -e msr:* --filter="msr==AMD_PKG_ENERGY_STATUS || msr==AMD_RAPL_POWER_UNIT"
      
      Just like it is now possible with other MSRs:
      
        [root@five ~]# uname -a
        Linux five 5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
        [root@five ~]# grep 'model name' -m1 /proc/cpuinfo
        model name	: AMD Ryzen 5 3600X 6-Core Processor
        [root@five ~]#
        [root@five ~]# perf trace -e msr:*/max-stack=16/ --filter="msr==AMD_PERF_CTL" --max-events=2
             0.000 kworker/1:1-ev/2327824 msr:write_msr(msr: AMD_PERF_CTL, val: 2)
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_trace_write_msr ([kernel.kallsyms])
                                               [0xffffffffc01d71c3] ([acpi_cpufreq])
                                               [0] ([unknown])
                                               __cpufreq_driver_target ([kernel.kallsyms])
                                               od_dbs_update ([kernel.kallsyms])
                                               dbs_work_handler ([kernel.kallsyms])
                                               process_one_work ([kernel.kallsyms])
                                               worker_thread ([kernel.kallsyms])
                                               kthread ([kernel.kallsyms])
                                               ret_from_fork ([kernel.kallsyms])
             8.597 kworker/2:2-ev/2338099 msr:write_msr(msr: AMD_PERF_CTL, val: 2)
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_trace_write_msr ([kernel.kallsyms])
                                               [0] ([unknown])
                                               [0] ([unknown])
                                               __cpufreq_driver_target ([kernel.kallsyms])
                                               od_dbs_update ([kernel.kallsyms])
                                               dbs_work_handler ([kernel.kallsyms])
                                               process_one_work ([kernel.kallsyms])
                                               worker_thread ([kernel.kallsyms])
                                               kthread ([kernel.kallsyms])
                                               ret_from_fork ([kernel.kallsyms])
        [root@five ~]#
      
      Longer explanation with what happens in the perf build process,
      automatically after this is made in synch with the kernel sources:
      
        $ make -C tools/perf O=/tmp/build/perf install-bin
        <SNIP>
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        <SNIP>
        make: Leaving directory '/home/acme/git/perf/tools/perf'
        $
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $
        $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
        --- tools/arch/x86/include/asm/msr-index.h	2020-06-02 10:46:36.217782288 -0300
        +++ arch/x86/include/asm/msr-index.h	2020-05-28 10:41:23.313794627 -0300
        @@ -301,6 +301,9 @@
         #define MSR_PP1_ENERGY_STATUS		0x00000641
         #define MSR_PP1_POLICY			0x00000642
      
        +#define MSR_AMD_PKG_ENERGY_STATUS	0xc001029b
        +#define MSR_AMD_RAPL_POWER_UNIT		0xc0010299
        +
         /* Config TDP MSRs */
         #define MSR_CONFIG_TDP_NOMINAL		0x00000648
         #define MSR_CONFIG_TDP_LEVEL_1		0x00000649
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $
        $ make -C tools/perf O=/tmp/build/perf install-bin
        <SNIP>
          CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
          LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
          LD       /tmp/build/perf/trace/beauty/perf-in.o
          LD       /tmp/build/perf/perf-in.o
          LINK     /tmp/build/perf/perf
        <SNIP>
        make: Leaving directory '/home/acme/git/perf/tools/perf'
        $
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-06-02 10:47:08.486334348 -0300
        +++ after	2020-06-02 10:47:33.075008948 -0300
        @@ -286,6 +286,8 @@
         	[0xc0010240 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTL",
         	[0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
         	[0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
        +	[0xc0010299 - x86_AMD_V_KVM_MSRs_offset] = "AMD_RAPL_POWER_UNIT",
        +	[0xc001029b - x86_AMD_V_KVM_MSRs_offset] = "AMD_PKG_ENERGY_STATUS",
         	[0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
         	[0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
         };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b1f47d6
  12. 14 4月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · bab1a501
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        6650cdd9 ("x86/split_lock: Enable split lock detection by kernel")
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Which causes these changes in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-04-01 12:11:14.789344795 -0300
        +++ after	2020-04-01 12:11:56.907798879 -0300
        @@ -10,6 +10,7 @@
         	[0x00000029] = "KNC_EVNTSEL1",
         	[0x0000002a] = "IA32_EBL_CR_POWERON",
         	[0x0000002c] = "EBC_FREQUENCY_ID",
        +	[0x00000033] = "TEST_CTRL",
         	[0x00000034] = "SMI_COUNT",
         	[0x0000003a] = "IA32_FEAT_CTL",
         	[0x0000003b] = "IA32_TSC_ADJUST",
        @@ -27,6 +28,7 @@
         	[0x000000c2] = "IA32_PERFCTR1",
         	[0x000000cd] = "FSB_FREQ",
         	[0x000000ce] = "PLATFORM_INFO",
        +	[0x000000cf] = "IA32_CORE_CAPS",
         	[0x000000e2] = "PKG_CST_CONFIG_CONTROL",
         	[0x000000e7] = "IA32_MPERF",
         	[0x000000e8] = "IA32_APERF",
        $
      
        $ make -C tools/perf O=/tmp/build/perf install-bin
        <SNIP>
          CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
          LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
          LD       /tmp/build/perf/trace/beauty/perf-in.o
          LD       /tmp/build/perf/perf-in.o
          LINK     /tmp/build/perf/perf
        <SNIP>
      
      Now one can do:
      
      	perf trace -e msr:* --filter=msr==IA32_CORE_CAPS
      
      or:
      
      	perf trace -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL'
      
      And see only those MSRs being accessed via:
      
        # perf trace -v -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL'
        New filter for msr:read_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
        New filter for msr:write_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
        New filter for msr:rdpmc: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/20200401153325.GC12534@kernel.org/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bab1a501
  13. 27 2月, 2020 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · d8e3ee2e
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes from these csets:
      
        21b5ee59 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF")
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ git diff
        diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
        index ebe1685e92dd..d5e517d1c3dd 100644
        --- a/tools/arch/x86/include/asm/msr-index.h
        +++ b/tools/arch/x86/include/asm/msr-index.h
        @@ -512,6 +512,8 @@
         #define MSR_K7_HWCR                    0xc0010015
         #define MSR_K7_HWCR_SMMLOCK_BIT                0
         #define MSR_K7_HWCR_SMMLOCK            BIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
        +#define MSR_K7_HWCR_IRPERF_EN_BIT      30
        +#define MSR_K7_HWCR_IRPERF_EN          BIT_ULL(MSR_K7_HWCR_IRPERF_EN_BIT)
         #define MSR_K7_FID_VID_CTL             0xc0010041
         #define MSR_K7_FID_VID_STATUS          0xc0010042
        $
      
      That don't result in any change in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        $
      
      To silence this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8e3ee2e
  14. 14 1月, 2020 1 次提交
  15. 02 12月, 2019 1 次提交
    • A
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 8122b047
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes from these csets:
      
        3f3c8be9 Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
        4e3f77d8 ("xen/mcelog: add PPIN to record when available")
        db4d30fb ("x86/bugs: Add ITLB_MULTIHIT bug infrastructure")
        1b42f017 ("x86/speculation/taa: Add mitigation for TSX Async Abort")
        c2955f27 ("x86/msr: Add the IA32_TSX_CTRL MSR")
      
      These are the changes in tooling that this udpate ensues:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > /tmp/before
        $
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > /tmp/after
        $ diff -u /tmp/before /tmp/after
        --- /tmp/before	2019-12-02 11:54:44.371035723 -0300
        +++ /tmp/after	2019-12-02 11:55:31.847859784 -0300
        @@ -48,6 +48,7 @@
         	[0x00000119] = "IA32_BBL_CR_CTL",
         	[0x0000011e] = "IA32_BBL_CR_CTL3",
         	[0x00000120] = "IDT_MCR_CTRL",
        +	[0x00000122] = "IA32_TSX_CTRL",
         	[0x00000140] = "MISC_FEATURES_ENABLES",
         	[0x00000174] = "IA32_SYSENTER_CS",
         	[0x00000175] = "IA32_SYSENTER_ESP",
        @@ -283,4 +284,6 @@
         	[0xc0010240 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTL",
         	[0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
         	[0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
        +	[0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
        +	[0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
         };
        $
      
        CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
        LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
        LD       /tmp/build/perf/trace/beauty/perf-in.o
        LD       /tmp/build/perf/perf-in.o
      
      Now it is possible to use these strings when setting up filters for the msr:*
      tracepoints, like:
      
        # perf trace -e msr:* --filter=msr==IA32_TSX_CTRL
        ^C[root@quaco ~]#
      
      If we use an invalid operator we can check what is the filter that is put in
      place:
      
        # perf trace -e msr:* --filter=msr=IA32_TSX_CTRL
        Failed to set filter "(msr=0x122) && (common_pid != 25976 && common_pid != 25860)" on event msr:read_msr with 22 (Invalid argument)
      
      One can as well use -v to see the tracepoints and its filters:
      
        # perf trace -v -e msr:* --filter=msr==IA32_TSX_CTRL
        Using CPUID GenuineIntel-6-8E-A
        New filter for msr:read_msr: (msr==0x122) && (common_pid != 26110 && common_pid != 25860)
        New filter for msr:write_msr: (msr==0x122) && (common_pid != 26110 && common_pid != 25860)
        New filter for msr:rdpmc: (msr==0x122) && (common_pid != 26110 && common_pid != 25860)
        mmap size 528384B
        ^C#
      
      Better than keep looking up those numbers, works with callchains as
      well, e.g. for something more common:
      
        # perf trace -e msr:*/max-stack=16/ --filter="msr==IA32_SPEC_CTRL" --max-events=2
             0.000 SCTP timer/6158 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_trace_write_msr ([kernel.kallsyms])
                                               __switch_to_xtra ([kernel.kallsyms])
                                               __switch_to ([kernel.kallsyms])
                                               __sched_text_start ([kernel.kallsyms])
                                               schedule ([kernel.kallsyms])
                                               schedule_hrtimeout_range_clock ([kernel.kallsyms])
                                               poll_schedule_timeout.constprop.0 ([kernel.kallsyms])
                                               do_select ([kernel.kallsyms])
                                               core_sys_select ([kernel.kallsyms])
                                               kern_select ([kernel.kallsyms])
                                               __x64_sys_select ([kernel.kallsyms])
                                               do_syscall_64 ([kernel.kallsyms])
                                               entry_SYSCALL_64 ([kernel.kallsyms])
                                               __select (/usr/lib64/libc-2.29.so)
                                               [0] ([unknown])
             0.024 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_trace_write_msr ([kernel.kallsyms])
                                               __switch_to_xtra ([kernel.kallsyms])
                                               __switch_to ([kernel.kallsyms])
                                               __sched_text_start ([kernel.kallsyms])
                                               schedule_idle ([kernel.kallsyms])
                                               do_idle ([kernel.kallsyms])
                                               cpu_startup_entry ([kernel.kallsyms])
                                               start_secondary ([kernel.kallsyms])
                                               [0x2000d4] ([kernel.kallsyms])
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
      Link: https://lkml.kernel.org/n/tip-n1xd78fpd5lxn4q1brqi2jl6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8122b047
  16. 07 10月, 2019 1 次提交
  17. 28 8月, 2019 1 次提交
  18. 20 8月, 2019 1 次提交
    • T
      x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h · c49a0a80
      Tom Lendacky 提交于
      There have been reports of RDRAND issues after resuming from suspend on
      some AMD family 15h and family 16h systems. This issue stems from a BIOS
      not performing the proper steps during resume to ensure RDRAND continues
      to function properly.
      
      RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
      reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
      support using CPUID, including the kernel, will believe that RDRAND is
      not supported.
      
      Update the CPU initialization to clear the RDRAND CPUID bit for any family
      15h and 16h processor that supports RDRAND. If it is known that the family
      15h or family 16h system does not have an RDRAND resume issue or that the
      system will not be placed in suspend, the "rdrand=force" kernel parameter
      can be used to stop the clearing of the RDRAND CPUID bit.
      
      Additionally, update the suspend and resume path to save and restore the
      MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
      place after resuming from suspend.
      
      Note, that clearing the RDRAND CPUID bit does not prevent a processor
      that normally supports the RDRAND instruction from executing it. So any
      code that determined the support based on family and model won't #UD.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chen Yu <yu.c.chen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
      Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
      c49a0a80
  19. 19 8月, 2019 1 次提交
  20. 24 6月, 2019 1 次提交
    • F
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu 提交于
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
  21. 01 5月, 2019 2 次提交
  22. 16 4月, 2019 1 次提交
    • K
      perf/x86/intel: Support adaptive PEBS v4 · c22497f5
      Kan Liang 提交于
      Adaptive PEBS is a new way to report PEBS sampling information. Instead
      of a fixed size record for all PEBS events it allows to configure the
      PEBS record to only include the information needed. Events can then opt
      in to use such an extended record, or stay with a basic record which
      only contains the IP.
      
      The major new feature is to support LBRs in PEBS record.
      Besides normal LBR, this allows (much faster) large PEBS, while still
      supporting callstacks through callstack LBR. So essentially a lot of
      profiling can now be done without frequent interrupts, dropping the
      overhead significantly.
      
      The main requirement still is to use a period, and not use frequency
      mode, because frequency mode requires reevaluating the frequency on each
      overflow.
      
      The floating point state (XMM) is also supported, which allows efficient
      profiling of FP function arguments.
      
      Introduce specific drain function to handle variable length records.
      Use a new callback to parse the new record format, and also handle the
      STATUS field now being at a different offset.
      
      Add code to set up the configuration register. Since there is only a
      single register, all events either get the full super set of all events,
      or only the basic record.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
      [ Renamed GPRS => GP. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c22497f5
  23. 07 3月, 2019 2 次提交
    • A
      x86/speculation/mds: Add basic bug infrastructure for MDS · ed5194c2
      Andi Kleen 提交于
      Microarchitectural Data Sampling (MDS), is a class of side channel attacks
      on internal buffers in Intel CPUs. The variants are:
      
       - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
       - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
       - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
      
      MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
      dependent load (store-to-load forwarding) as an optimization. The forward
      can also happen to a faulting or assisting load operation for a different
      memory address, which can be exploited under certain conditions. Store
      buffers are partitioned between Hyper-Threads so cross thread forwarding is
      not possible. But if a thread enters or exits a sleep state the store
      buffer is repartitioned which can expose data from one thread to the other.
      
      MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
      L1 miss situations and to hold data which is returned or sent in response
      to a memory or I/O operation. Fill buffers can forward data to a load
      operation and also write data to the cache. When the fill buffer is
      deallocated it can retain the stale data of the preceding operations which
      can then be forwarded to a faulting or assisting load operation, which can
      be exploited under certain conditions. Fill buffers are shared between
      Hyper-Threads so cross thread leakage is possible.
      
      MLDPS leaks Load Port Data. Load ports are used to perform load operations
      from memory or I/O. The received data is then forwarded to the register
      file or a subsequent operation. In some implementations the Load Port can
      contain stale data from a previous operation which can be forwarded to
      faulting or assisting loads under certain conditions, which again can be
      exploited eventually. Load ports are shared between Hyper-Threads so cross
      thread leakage is possible.
      
      All variants have the same mitigation for single CPU thread case (SMT off),
      so the kernel can treat them as one MDS issue.
      
      Add the basic infrastructure to detect if the current CPU is affected by
      MDS.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      ed5194c2
    • T
      x86/msr-index: Cleanup bit defines · d8eabc37
      Thomas Gleixner 提交于
      Greg pointed out that speculation related bit defines are using (1 << N)
      format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
      1UL at least.
      
      Clean it up.
      
      [ Josh Poimboeuf: Fix tools build ]
      Reported-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      d8eabc37
  24. 06 3月, 2019 1 次提交
  25. 21 12月, 2018 3 次提交
  26. 19 12月, 2018 1 次提交
  27. 28 11月, 2018 1 次提交
    • T
      x86/speculation: Prepare for per task indirect branch speculation control · 5bfbe3ad
      Tim Chen 提交于
      To avoid the overhead of STIBP always on, it's necessary to allow per task
      control of STIBP.
      
      Add a new task flag TIF_SPEC_IB and evaluate it during context switch if
      SMT is active and flag evaluation is enabled by the speculation control
      code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the
      guest/host switch works properly.
      
      This has no effect because TIF_SPEC_IB cannot be set yet and the static key
      which controls evaluation is off. Preparatory patch for adding the control
      code.
      
      [ tglx: Simplify the context switch logic and make the TIF evaluation
        	depend on SMP=y and on the static key controlling the conditional
        	update. Rename it to TIF_SPEC_IB because it controls both STIBP and
        	IBPB ]
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de
      
      5bfbe3ad
  28. 02 10月, 2018 1 次提交
    • A
      perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler · af3bdb99
      Andi Kleen 提交于
      Implements counter freezing for Arch Perfmon v4 (Skylake and
      newer). This allows to speed up the PMI handler by avoiding
      unnecessary MSR writes and make it more accurate.
      
      The Arch Perfmon v4 PMI handler is substantially different than
      the older PMI handler.
      
      Differences to the old handler:
      
      - It relies on counter freezing, which eliminates several MSR
        writes from the PMI handler and lowers the overhead significantly.
      
        It makes the PMI handler more accurate, as all counters get
        frozen atomically as soon as any counter overflows. So there is
        much less counting of the PMI handler itself.
      
        With the freezing we don't need to disable or enable counters or
        PEBS. Only BTS which does not support auto-freezing still needs to
        be explicitly managed.
      
      - The PMU acking is done at the end, not the beginning.
        This makes it possible to avoid manual enabling/disabling
        of the PMU, instead we just rely on the freezing/acking.
      
      - The APIC is acked before reenabling the PMU, which avoids
        problems with LBRs occasionally not getting unfreezed on Skylake.
      
      - Looping is only needed to workaround a corner case which several PMIs
        are very close to each other. For common cases, the counters are freezed
        during PMI handler. It doesn't need to do re-check.
      
      This patch:
      
      - Adds code to enable v4 counter freezing
      - Fork <=v3 and >=v4 PMI handlers into separate functions.
      - Add kernel parameter to disable counter freezing. It took some time to
        debug counter freezing, so in case there are new problems we added an
        option to turn it off. Would not expect this to be used until there
        are new bugs.
      - Only for big core. The patch for small core will be posted later
        separately.
      
      Performance:
      
      When profiling a kernel build on Kabylake with different perf options,
      measuring the length of all NMI handlers using the nmi handler
      trace point:
      
      V3 is without counter freezing.
      V4 is with counter freezing.
      The value is the average cost of the PMI handler.
      (lower is better)
      
      perf options    `           V3(ns) V4(ns)  delta
      -c 100000                   1088   894     -18%
      -g -c 100000                1862   1646    -12%
      --call-graph lbr -c 100000  3649   3367    -8%
      --c.g. dwarf -c 100000      2248   1982    -12%
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af3bdb99
  29. 05 8月, 2018 1 次提交
  30. 05 7月, 2018 1 次提交
    • P
      x86/KVM/VMX: Add L1D MSR based flush · 3fa045be
      Paolo Bonzini 提交于
      336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
      (IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other
      MSRs defined in the document.
      
      The semantics of this MSR is to allow "finer granularity invalidation of
      caching structures than existing mechanisms like WBINVD. It will writeback
      and invalidate the L1 data cache, including all cachelines brought in by
      preceding instructions, without invalidating all caches (eg. L2 or
      LLC). Some processors may also invalidate the first level level instruction
      cache on a L1D_FLUSH command. The L1 data and instruction caches may be
      shared across the logical processors of a core."
      
      Use it instead of the loop based L1 flush algorithm.
      
      A copy of this document is available at
         https://bugzilla.kernel.org/show_bug.cgi?id=199511
      
      [ tglx: Avoid allocating pages when the MSR is available ]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3fa045be
  31. 02 6月, 2018 1 次提交
  32. 18 5月, 2018 1 次提交
  33. 17 5月, 2018 1 次提交