提交 · d10551738f6adcc3e6040fc846b171e72e94f0e9 · openeuler / Kernel

08 6月, 2022 6 次提交

KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled · d1055173

由 Like Xu 提交于 4月 11, 2022

The bit 12 represents "Processor Event Based Sampling Unavailable (RO)" :
	1 = PEBS is not supported.
	0 = PEBS is supported.

A write to this PEBS_UNAVL available bit will bring #GP(0) when guest PEBS
is enabled. Some PEBS drivers in guest may care about this bit.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20220411101946.20262-13-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1055173

KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS · 902caeb6

由 Like Xu 提交于 4月 11, 2022

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive
PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable
bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL.
FCx_Adaptive_Record) are also supported.

Adaptive PEBS provides software the capability to configure the PEBS
records to capture only the data of interest, keeping the record size
compact. An overflow of PMCx results in generation of an adaptive PEBS
record with state information based on the selections specified in
MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group.

When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will
be added to the perf_guest_switch_msr() and switched during the VMX
transitions just like CORE_PERF_GLOBAL_CTRL MSR.

According to Intel SDM, software is recommended to  PEBS Baseline
when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14]
&& IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4.
Co-developed-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-12-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

902caeb6

KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS · 8183a538

由 Like Xu 提交于 4月 11, 2022

When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points
to the linear address of the first byte of the DS buffer management area,
which is used to manage the PEBS records.

When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the
perf_guest_switch_msr() and switched during the VMX transitions just like
CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0)
if the source register contains a non-canonical address.
Originally-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-11-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8183a538

KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS · c59a1f10

由 Like Xu 提交于 4月 11, 2022

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the
IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed
and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE
that enable generation of PEBS records. The general-purpose counter bits
start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at
bit IA32_PEBS_ENABLE[32].

When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be
added to the perf_guest_switch_msr() and atomically switched during
the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Based on whether the platform supports x86_pmu.pebs_ept, it has also
refactored the way to add more msrs to arr[] in intel_guest_get_msrs()
for extensibility.
Originally-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Co-developed-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-8-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c59a1f10

KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter · 2c985527

由 Like Xu 提交于 4月 11, 2022

The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.
Co-developed-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2c985527

KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled · bef6ecca

由 Like Xu 提交于 4月 11, 2022

On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
detect whether the processor supports performance monitoring facility.

It depends on the PMU is enabled for the guest, and a software write
operation to this available bit will be ignored. The proposal to ignore
the toggle in KVM is the way to go and that behavior matches bare metal.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-5-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bef6ecca

22 4月, 2022 1 次提交

KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog · 75189d1d

由 Like Xu 提交于 4月 09, 2022

NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.

This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.

Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.

Fixes: 168d918f ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: NDongli Cao <caodongli@kingsoft.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NYanan Wang <wangyanan55@huawei.com>
Tested-by: NYanan Wang <wangyanan55@huawei.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75189d1d

14 4月, 2022 1 次提交

KVM: x86: Move .pmu_ops to kvm_x86_init_ops and tag as __initdata · 34886e79

由 Like Xu 提交于 3月 29, 2022

The pmu_ops should be moved to kvm_x86_init_ops and tagged as __initdata.
That'll save those precious few bytes, and more importantly make
the original ops unreachable, i.e. make it harder to sneak in post-init
modification bugs.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220329235054.3534728-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34886e79

02 4月, 2022 2 次提交

KVM: x86/pmu: Fix and isolate TSX-specific performance event logic · e644896f

由 Like Xu 提交于 3月 09, 2022

HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.

Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."

Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: NRavi Bangoria <ravi.bangoria@amd.com>
Reported-by: NJim Mattson <jmattson@google.com>
Fixes: 103af0a9 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
Co-developed-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e644896f

KVM: x86/pmu: Use different raw event masks for AMD and Intel · 95b065bf

由 Jim Mattson 提交于 3月 07, 2022

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95b065bf

25 2月, 2022 1 次提交

KVM: x86: Provide per VM capability for disabling PMU virtualization · ba7bb663

由 David Dunn 提交于 2月 23, 2022

Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
settings/features to allow userspace to configure PMU virtualization on
a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
to allow disabling PMU virtualization for a VM even when KVM is configured
with enable_pmu=true a module level.

To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
have been created.
Signed-off-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba7bb663

11 2月, 2022 1 次提交

KVM: nVMX: Refactor PMU refresh to avoid referencing kvm_x86_ops.pmu_ops · 0bcd556e

由 Sean Christopherson 提交于 1月 28, 2022

Refactor the nested VMX PMU refresh helper to pass it a flag stating
whether or not the vCPU has PERF_GLOBAL_CTRL instead of having the nVMX
helper query the information by bouncing through kvm_x86_ops.pmu_ops.
This will allow a future patch to use static_call() for the PMU ops
without having to export any static call definitions from common x86, and
it is also a step toward unexported kvm_x86_ops.

Alternatively, nVMX could call kvm_pmu_is_valid_msr() to indirectly use
kvm_x86_ops.pmu_ops, but that would incur an extra layer of indirection
and would require exporting kvm_pmu_is_valid_msr().

Opportunistically rename the helper to keep line lengths somewhat
reasonable, and to better capture its high-level role.

No functional change intended.

Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0bcd556e

02 2月, 2022 1 次提交

KVM: x86: use the KVM side max supported fixed counter · 0144ba0c

由 Wei Wang 提交于 2月 01, 2022

KVM vPMU doesn't support to emulate all the fixed counters that the
host PMU driver has supported, e.g. the fixed counter 3 used by
Topdown metrics hasn't been supported by KVM so far.

Rename MAX_FIXED_COUNTERS to KVM_PMC_MAX_FIXED to have a more
straightforward naming convention as INTEL_PMC_MAX_FIXED used by the
host PMU driver, and fix vPMU to use the KVM side KVM_PMC_MAX_FIXED
for the virtual fixed counter emulation, instead of the host side
INTEL_PMC_MAX_FIXED.
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1643750603-100733-2-git-send-email-kan.liang@linux.intel.com

0144ba0c

18 1月, 2022 2 次提交

KVM: x86: Making the module parameter of vPMU more common · 4732f244

由 Like Xu 提交于 1月 11, 2022

The new module parameter to control PMU virtualization should apply
to Intel as well as AMD, for situations where userspace is not trusted.
If the module parameter allows PMU virtualization, there could be a
new KVM_CAP or guest CPUID bits whereby userspace can enable/disable
PMU virtualization on a per-VM basis.

If the module parameter does not allow PMU virtualization, there
should be no userspace override, since we have no precedent for
authorizing that kind of override. If it's false, other counter-based
profiling features (such as LBR including the associated CPUID bits
if any) will not be exposed.

Change its name from "pmu" to "enable_pmu" as we have temporary
variables with the same name in our code like "struct kvm_pmu *pmu".

Fixes: b1d66dad ("KVM: x86/svm: Add module param to control PMU virtualization")
Suggested-by : Jim Mattson <jmattson@google.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220111073823.21885-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4732f244

KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event · a2186448

由 Like Xu 提交于 1月 05, 2022

According to CPUID 0x0A.EBX bit vector, the event [7] should be the
unrealized event "Topdown Slots" instead of the *kernel* generalized
common hardware event "REF_CPU_CYCLES", so we need to skip the cpuid
unavaliblity check in the intel_pmc_perf_hw_id() for the last
REF_CPU_CYCLES event and update the confusing comment.

If the event is marked as unavailable in the Intel guest CPUID
0AH.EBX leaf, we need to avoid any perf_event creation, whether
it's a gp or fixed counter. To distinguish whether it is a rejected
event or an event that needs to be programmed with PERF_TYPE_RAW type,
a new special returned value of "PERF_COUNT_HW_MAX + 1" is introduced.

Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220105051509.69437-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2186448

07 1月, 2022 4 次提交

KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() · 6ed1298e

由 Like Xu 提交于 11月 30, 2021

Since we set the same semantic event value for the fixed counter in
pmc->eventsel, returning the perf_hw_id for the fixed counter via
find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
with the help of pmc_is_fixed() check.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-4-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ed1298e

KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() · 7c174f30

由 Like Xu 提交于 11月 30, 2021

The find_arch_event() returns a "unsigned int" value,
which is used by the pmc_reprogram_counter() to
program a PERF_TYPE_HARDWARE type perf_event.

The returned value is actually the kernel defined generic
perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
incoming parameters for better self-explanation.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-3-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c174f30

KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs · 76187563

由 Like Xu 提交于 11月 30, 2021

The current pmc->eventsel for fixed counter is underutilised. The
pmc->eventsel can be setup for all known available fixed counters
since we have mapping between fixed pmc index and
the intel_arch_events array.

Either gp or fixed counter, it will simplify the later checks for
consistency between eventsel and perf_hw_id.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-2-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76187563

KVM: x86: avoid out of bounds indices for fixed performance counters · 006a0f06

由 Paolo Bonzini 提交于 12月 09, 2021

Because IceLake has 4 fixed performance counters but KVM only
supports 3, it is possible for reprogram_fixed_counters to pass
to reprogram_fixed_counter an index that is out of bounds for the
fixed_pmc_events array.

Ultimately intel_find_fixed_event, which is the only place that uses
fixed_pmc_events, handles this correctly because it checks against the
size of fixed_pmc_events anyway.  Every other place operates on the
fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED.
However, it is cleaner if the unsupported performance counters are culled
early on in reprogram_fixed_counters.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

006a0f06

11 11月, 2021 1 次提交

kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1

由 Jim Mattson 提交于 11月 05, 2021

These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.

Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6cd31f1

22 10月, 2021 1 次提交

KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 · 540c7abe

由 Wanpeng Li 提交于 10月 19, 2021

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: NLike Xu <likexu@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

540c7abe

04 8月, 2021 1 次提交

KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces · e79f49c3

由 Like Xu 提交于 7月 28, 2021

Based on our observations, after any vm-exit associated with vPMU, there
are at least two or more perf interfaces to be called for guest counter
emulation, such as perf_event_{pause, read_value, period}(), and each one
will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
more severe when guest use counters in a multiplexed manner.

Holding a lock once and completing the KVM request operations in the perf
context would introduce a set of impractical new interfaces. So we can
further optimize the vPMU implementation by avoiding repeated calls to
these interfaces in the KVM context for at least one pattern:

After we call perf_event_pause() once, the event will be disabled and its
internal count will be reset to 0. So there is no need to pause it again
or read its value. Once the event is paused, event period will not be
updated until the next time it's resumed or reprogrammed. And there is
also no need to call perf_event_period twice for a non-running counter,
considering the perf_event for a running counter is never paused.

Based on this implementation, for the following common usage of
sampling 4 events using perf on a 4u8g guest:

  echo 0 > /proc/sys/kernel/watchdog
  echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
  echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
  echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
  for i in `seq 1 1 10`
  do
  taskset -c 0 perf record \
  -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
  /root/br_instr a
  done

the average latency of the guest NMI handler is reduced from
37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
Also, in addition to collecting more samples, no loss of sampling
accuracy was observed compared to before the optimization.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20210728120705.6855-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

e79f49c3

24 2月, 2021 1 次提交

KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created · 67b45af9

由 Like Xu 提交于 2月 23, 2021

If lbr_desc->event is successfully created, the intel_pmu_create_
guest_lbr_event() will return 0, otherwise it will return -ENOENT,
and then jump to LBR msrs dummy handling.

Fixes: 1b5ac322 ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com>
[Add "< 0" and PTR_ERR to make the code clearer. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67b45af9

04 2月, 2021 8 次提交

KVM: vmx/pmu: Release guest LBR event via lazy release mechanism · 9aa4f622

由 Like Xu 提交于 2月 01, 2021

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-10-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9aa4f622

KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI · e6209a3b

由 Like Xu 提交于 2月 01, 2021

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6209a3b

KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation · 9254beaa

由 Like Xu 提交于 2月 01, 2021

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-8-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9254beaa

KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE · 1b5ac322

由 Like Xu 提交于 2月 01, 2021

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-7-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b5ac322

KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR · 8e12911b

由 Like Xu 提交于 2月 01, 2021

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-6-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8e12911b

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · c6462363

由 Like Xu 提交于 2月 01, 2021

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6462363

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 9c9520ce

由 Paolo Bonzini 提交于 2月 02, 2021

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c9520ce

KVM: x86/pmu: preserve IA32_PERF_CAPABILITIES across CPUID refresh · a7557539

由 Paolo Bonzini 提交于 2月 02, 2021

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(), the
value should not be changed by cpuid(). To ensure that the new value
is kept, the default initialization path is moved to intel_pmu_init().
The effective value of the MSR will be 0 if PDCM is clear, however.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7557539

26 1月, 2021 2 次提交

KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10

由 Like Xu 提交于 12月 30, 2020

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98dd2f10

KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3

由 Like Xu 提交于 1月 18, 2021

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e61ab2a3

11 7月, 2020 1 次提交

KVM: x86: move MSR_IA32_PERF_CAPABILITIES emulation to common x86 code · d574c539

由 Vitaly Kuznetsov 提交于 7月 10, 2020

state_test/smm_test selftests are failing on AMD with:
"Unexpected result from KVM_GET_MSRS, r: 51 (failed MSR was 0x345)"

MSR_IA32_PERF_CAPABILITIES is an emulated MSR on Intel but it is not
known to AMD code, we can move the emulation to common x86 code. For
AMD, we basically just allow the host to read and write zero to the MSR.

Fixes: 27461da3 ("KVM: x86/pmu: Support full width counting")
Suggested-by: NJim Mattson <jmattson@google.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710152559.1645827-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d574c539

05 6月, 2020 1 次提交

KVM: VMX: Always treat MSR_IA32_PERF_CAPABILITIES as a valid PMU MSR · f4f6bd93

由 Sean Christopherson 提交于 6月 03, 2020

Unconditionally return true when querying the validity of
MSR_IA32_PERF_CAPABILITIES so as to defer the validity check to
intel_pmu_{get,set}_msr(), which can properly give the MSR a pass when
the access is initiated from host userspace.  The MSR is emulated so
there is no underlying hardware dependency to worry about.

Fixes: 27461da3 ("KVM: x86/pmu: Support full width counting")
Cc: Like Xu <like.xu@linux.intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200603203303.28545-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4f6bd93

01 6月, 2020 2 次提交

KVM: x86/pmu: Support full width counting · 27461da3

由 Like Xu 提交于 5月 29, 2020

Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
for GP counters that allows writing the full counter width. Enable this
range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).

The guest would query CPUID to get the counter width, and sign extends
the counter values as needed. The traditional MSRs always limit to 32bit,
even though the counter internally is larger (48 or 57 bits).

When the new capability is set, use the alternative range which do not
have these restrictions. This lowers the overhead of perf stat slightly
because it has to do less interrupts to accumulate the counter value.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27461da3

KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in · cbd71758

由 Wei Wang 提交于 5月 29, 2020

Change kvm_pmu_get_msr() to get the msr_data struct, as the host_initiated
field from the struct could be used by get_msr. This also makes this API
consistent with kvm_pmu_set_msr. No functional changes.
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Message-Id: <20200529074347.124619-2-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cbd71758

17 3月, 2020 2 次提交

KVM: VMX: Directly query Intel PT mode when refreshing PMUs · a1bead2a

由 Sean Christopherson 提交于 3月 02, 2020

Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of
bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported()
as the PMU code was the last remaining user.

Opportunistically clean up the wording of a comment that referenced
kvm_x86_ops->pt_supported().

No functional change intended.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1bead2a

KVM: x86: Adjust counter sample period after a wrmsr · 168d918f

由 Eric Hankland 提交于 2月 21, 2020

The sample_period of a counter tracks when that counter will
overflow and set global status/trigger a PMI. However this currently
only gets set when the initial counter is created or when a counter is
resumed; this updates the sample period after a wrmsr so running
counters will accurately reflect their new value.
Signed-off-by: NEric Hankland <ehankland@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

168d918f

05 2月, 2020 1 次提交

KVM: x86: Fix perfctr WRMSR for running counters · 4400cf54

由 Eric Hankland 提交于 1月 27, 2020

Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.
Signed-off-by: NEric Hankland <ehankland@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4400cf54

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功