提交 · e11c1a7eb302ac8f6f47c18fa662546405a5fd83 · openeuler / Kernel

20 4月, 2021 15 次提交

perf/x86: Factor out x86_pmu_show_pmu_cap · e11c1a7e

由 Kan Liang 提交于 4月 12, 2021

The PMU capabilities are different among hybrid PMUs. Perf should dump
the PMU capabilities information for each hybrid PMU.

Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities
information. The function will be reused later when registering a
dedicated hybrid PMU.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-16-git-send-email-kan.liang@linux.intel.com

e11c1a7e

perf/x86: Remove temporary pmu assignment in event_init · b9856729

由 Kan Liang 提交于 4月 12, 2021

The temporary pmu assignment in event_init is unnecessary.

The assignment was introduced by commit 8113070d ("perf_events:
Add fast-path to the rescheduling code"). At that time, event->pmu is
not assigned yet when initializing an event. The assignment is required.
However, from commit 7e5b2a01 ("perf: provide PMU when initing
events"), the event->pmu is provided before event_init is invoked.
The temporary pmu assignment in event_init should be removed.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-15-git-send-email-kan.liang@linux.intel.com

b9856729

perf/x86/intel: Factor out intel_pmu_check_extra_regs · 34d5b61f

由 Kan Liang 提交于 4月 12, 2021

Each Hybrid PMU has to check and update its own extra registers before
registration.

The intel_pmu_check_extra_regs will be reused later to check the extra
registers of each hybrid PMU.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-14-git-send-email-kan.liang@linux.intel.com

34d5b61f

perf/x86/intel: Factor out intel_pmu_check_event_constraints · bc14fe1b

由 Kan Liang 提交于 4月 12, 2021

Each Hybrid PMU has to check and update its own event constraints before
registration.

The intel_pmu_check_event_constraints will be reused later to check
the event constraints of each hybrid PMU.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-13-git-send-email-kan.liang@linux.intel.com

bc14fe1b

perf/x86/intel: Factor out intel_pmu_check_num_counters · b8c4d1a8

由 Kan Liang 提交于 4月 12, 2021

Each Hybrid PMU has to check its own number of counters and mask fixed
counters before registration.

The intel_pmu_check_num_counters will be reused later to check the
number of the counters for each hybrid PMU.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-12-git-send-email-kan.liang@linux.intel.com

b8c4d1a8

perf/x86: Hybrid PMU support for extra_regs · 183af736

由 Kan Liang 提交于 4月 12, 2021

Different hybrid PMU may have different extra registers, e.g. Core PMU
may have offcore registers, frontend register and ldlat register. Atom
core may only have offcore registers and ldlat register. Each hybrid PMU
should use its own extra_regs.

An Intel Hybrid system should always have extra registers.
Unconditionally allocate shared_regs for Intel Hybrid system.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-11-git-send-email-kan.liang@linux.intel.com

183af736

perf/x86: Hybrid PMU support for event constraints · 24ee38ff

由 Kan Liang 提交于 4月 12, 2021

The events are different among hybrid PMUs. Each hybrid PMU should use
its own event constraints.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-10-git-send-email-kan.liang@linux.intel.com

24ee38ff

perf/x86: Hybrid PMU support for hardware cache event · 0d18f2df

由 Kan Liang 提交于 4月 12, 2021

The hardware cache events are different among hybrid PMUs. Each hybrid
PMU should have its own hw cache event table.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-9-git-send-email-kan.liang@linux.intel.com

0d18f2df

perf/x86: Hybrid PMU support for unconstrained · eaacf07d

由 Kan Liang 提交于 4月 12, 2021

The unconstrained value depends on the number of GP and fixed counters.
Each hybrid PMU should use its own unconstrained.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-8-git-send-email-kan.liang@linux.intel.com

eaacf07d

perf/x86: Hybrid PMU support for counters · d4b294bf

由 Kan Liang 提交于 4月 12, 2021

The number of GP and fixed counters are different among hybrid PMUs.
Each hybrid PMU should use its own counter related information.

When handling a certain hybrid PMU, apply the number of counters from
the corresponding hybrid PMU.

When reserving the counters in the initialization of a new event,
reserve all possible counters.

The number of counter recored in the global x86_pmu is for the
architecture counters which are available for all hybrid PMUs. KVM
doesn't support the hybrid PMU yet. Return the number of the
architecture counters for now.

For the functions only available for the old platforms, e.g.,
intel_pmu_drain_pebs_nhm(), nothing is changed.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-7-git-send-email-kan.liang@linux.intel.com

d4b294bf

perf/x86: Hybrid PMU support for intel_ctrl · fc4b8fca

由 Kan Liang 提交于 4月 12, 2021

The intel_ctrl is the counter mask of a PMU. The PMU counter information
may be different among hybrid PMUs, each hybrid PMU should use its own
intel_ctrl to check and access the counters.

When handling a certain hybrid PMU, apply the intel_ctrl from the
corresponding hybrid PMU.

When checking the HW existence, apply the PMU and number of counters
from the corresponding hybrid PMU as well. Perf will check the HW
existence for each Hybrid PMU before registration. Expose the
check_hw_exists() for a later patch.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1618237865-33448-6-git-send-email-kan.liang@linux.intel.com

fc4b8fca

perf/x86/intel: Hybrid PMU support for perf capabilities · d0946a88

由 Kan Liang 提交于 4月 12, 2021

Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.

Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.

The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.

For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.

Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system.
It will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com

d0946a88

perf/x86: Track pmu in per-CPU cpu_hw_events · 61e76d53

由 Kan Liang 提交于 4月 12, 2021

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618237865-33448-4-git-send-email-kan.liang@linux.intel.com

61e76d53

x86/cpu: Add helper function to get the type of the current hybrid CPU · 250b3c0d

由 Ricardo Neri 提交于 4月 12, 2021

On processors with Intel Hybrid Technology (i.e., one having more than
one type of CPU in the same package), all CPUs support the same
instruction set and enumerate the same features on CPUID. Thus, all
software can run on any CPU without restrictions. However, there may be
model-specific differences among types of CPUs. For instance, each type
of CPU may support a different number of performance counters. Also,
machine check error banks may be wired differently. Even though most
software will not care about these differences, kernel subsystems
dealing with these differences must know.

Add and expose a new helper function get_this_hybrid_cpu_type() to query
the type of the current hybrid CPU. The function will be used later in
the perf subsystem.

The Intel Software Developer's Manual defines the CPU type as 8-bit
identifier.
Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Reviewed-by: NLen Brown <len.brown@intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1618237865-33448-3-git-send-email-kan.liang@linux.intel.com

250b3c0d

x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit · a161545a

由 Ricardo Neri 提交于 4月 12, 2021

Add feature enumeration to identify a processor with Intel Hybrid
Technology: one in which CPUs of more than one type are the same package.
On a hybrid processor, all CPUs support the same homogeneous (i.e.,
symmetric) instruction set. All CPUs enumerate the same features in CPUID.
Thus, software (user space and kernel) can run and migrate to any CPU in
the system as well as utilize any of the enumerated features without any
change or special provisions. The main difference among CPUs in a hybrid
processor are power and performance properties.
Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Reviewed-by: NLen Brown <len.brown@intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1618237865-33448-2-git-send-email-kan.liang@linux.intel.com

a161545a

17 4月, 2021 2 次提交

perf/amd/uncore: Fix sysfs type mismatch · 5deac80d

由 Nathan Chancellor 提交于 4月 14, 2021

dev_attr_show() calls the __uncore_*_show() functions via an indirect
call but their type does not currently match the type of the show()
member in 'struct device_attribute', resulting in a Control Flow
Integrity violation.

$ cat /sys/devices/amd_l3/format/umask
config:8-15

$ dmesg | grep "CFI failure"
[ 1258.174653] CFI failure (target: __uncore_umask_show...):

Update the type in the DEFINE_UNCORE_FORMAT_ATTR macro to match
'struct device_attribute' so that there is no more CFI violation.

Fixes: 06f2c245 ("perf/amd/uncore: Prepare to scale for more attributes that vary per family")
Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210415001112.3024673-2-nathan@kernel.org

5deac80d

x86/events/amd/iommu: Fix sysfs type mismatch · de5bc7b4

由 Nathan Chancellor 提交于 4月 14, 2021

dev_attr_show() calls _iommu_event_show() via an indirect call but
_iommu_event_show()'s type does not currently match the type of the
show() member in 'struct device_attribute', resulting in a Control Flow
Integrity violation.

$ cat /sys/devices/amd_iommu_1/events/mem_dte_hit
csource=0x0a

$ dmesg | grep "CFI failure"
[ 3526.735140] CFI failure (target: _iommu_event_show...):

Change _iommu_event_show() and 'struct amd_iommu_event_desc' to
'struct device_attribute' so that there is no more CFI violation.

Fixes: 7be6296f ("perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation")
Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210415001112.3024673-1-nathan@kernel.org

de5bc7b4

16 4月, 2021 2 次提交

perf/x86: Move cpuc->running into P4 specific code · 46ade474

由 Kan Liang 提交于 4月 14, 2021

The 'running' variable is only used in the P4 PMU. Current perf sets the
variable in the critical function x86_pmu_start(), which wastes cycles
for everybody not running on P4.

Move cpuc->running into the P4 specific p4_pmu_enable_event().

Add a static per-CPU 'p4_running' variable to replace the 'running'
variable in the struct cpu_hw_events. Saves space for the generic
structure.

The p4_pmu_enable_all() also invokes the p4_pmu_enable_event(), but it
should not set cpuc->running. Factor out __p4_pmu_enable_event() for
p4_pmu_enable_all().
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1618410990-21383-1-git-send-email-kan.liang@linux.intel.com

46ade474

signal: Introduce TRAP_PERF si_code and si_perf to siginfo · fb6cc127

由 Marco Elver 提交于 4月 08, 2021

Introduces the TRAP_PERF si_code, and associated siginfo_t field
si_perf. These will be used by the perf event subsystem to send signals
(if requested) to the task where an event occurred.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de> # asm-generic
Link: https://lkml.kernel.org/r/20210408103605.1676875-6-elver@google.com

fb6cc127

02 4月, 2021 6 次提交

perf/x86/intel/uncore: Enable IIO stacks to PMON mapping for multi-segment SKX · cface032

由 Alexander Antonov 提交于 3月 23, 2021

IIO stacks to PMON mapping on Skylake servers is exposed through introduced
early attributes /sys/devices/uncore_iio_<pmu_idx>/dieX, where dieX is a
file which holds "Segment:Root Bus" for PCIe root port which can
be monitored by that IIO PMON block. These sysfs attributes are disabled
for multiple segment topologies except VMD domains which start at 0x10000.
This patch removes the limitation and enables IIO stacks to PMON mapping
for multi-segment Skylake servers by introducing segment-aware
intel_uncore_topology structure and attributing the topology configuration
to the segment in skx_iio_get_topology() function.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAlexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Tested-by: NKyle Meyer <kyle.meyer@hpe.com>
Link: https://lkml.kernel.org/r/20210323150507.2013-1-alexander.antonov@linux.intel.com

cface032

perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks · c4c55e36

由 Kan Liang 提交于 3月 17, 2021

The discovery table provides the generic uncore block information
for the MMIO type of uncore blocks, which is good enough to provide
basic uncore support.

The box control field is composed of the BAR address and box control
offset. When initializing the uncore blocks, perf should ioremap the
address from the box control field.

Implement the generic support for the MMIO type of uncore block.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-6-git-send-email-kan.liang@linux.intel.com

c4c55e36

perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks · 42839ef4

由 Kan Liang 提交于 3月 17, 2021

The discovery table provides the generic uncore block information
for the PCI type of uncore blocks, which is good enough to provide
basic uncore support.

The PCI BUS and DEVFN information can be retrieved from the box control
field. Introduce the uncore_pci_pmus_register() to register all the
PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.

The PCI BUS and DEVFN information are different among dies. Add box_ctls
to store the box control field of each die.

Add a new BUS notifier for the PCI type of uncore block to support the
hotplug. If the device is "hot remove", the corresponding registered PMU
has to be unregistered. Perf cannot locate the PMU by searching a const
pci_device_id table, because the discovery tables don't provide such
information. Introduce uncore_pci_find_dev_pmu_from_types() to search
the whole uncore_pci_uncores for the PMU.

Implement generic support for the PCI type of uncore block.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-5-git-send-email-kan.liang@linux.intel.com

42839ef4

perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier · 6477dc39

由 Kan Liang 提交于 3月 17, 2021

Perf will use a similar method to the PCI sub driver to register
the PMUs for the PCI type of uncore blocks. The method requires a BUS
notifier to support hotplug. The current BUS notifier cannot be reused,
because it searches a const id_table for the corresponding registered
PMU. The PCI type of uncore blocks in the discovery tables doesn't
provide an id_table.

Factor out uncore_bus_notify() and add the pointer of an id_table as a
parameter. The uncore_bus_notify() will be reused in the following
patch.

The current BUS notifier is only used by the PCI sub driver. Its name is
too generic. Rename it to uncore_pci_sub_notifier, which is specific for
the PCI sub driver.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-4-git-send-email-kan.liang@linux.intel.com

6477dc39

perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks · d6c75413

由 Kan Liang 提交于 3月 17, 2021

The discovery table provides the generic uncore block information for
the MSR type of uncore blocks, e.g., the counter width, the number of
counters, the location of control/counter registers, which is good
enough to provide basic uncore support. It can be used as a fallback
solution when the kernel doesn't support a platform.

The name of the uncore box cannot be retrieved from the discovery table.
uncore_type_&typeID_&boxID will be used as its name. Save the type ID
and the box ID information in the struct intel_uncore_type.
Factor out uncore_get_pmu_name() to handle different naming methods.

Implement generic support for the MSR type of uncore block.

Some advanced features, such as filters and constraints, cannot be
retrieved from discovery tables. Features that rely on that
information are not be supported here.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-3-git-send-email-kan.liang@linux.intel.com

d6c75413

perf/x86/intel/uncore: Parse uncore discovery tables · edae1f06

由 Kan Liang 提交于 3月 17, 2021

A self-describing mechanism for the uncore PerfMon hardware has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, perf can 'discover' all the standard uncore
PerfMon registers in a machine.

The discovery mechanism relies on BIOS's support. With a proper BIOS,
a PCI device with the unique capability ID 0x23 can be found on each
die. Perf can retrieve the information of all available uncore PerfMons
from the device via MMIO. The information is composed of one global
discovery table and several unit discovery tables.
- The global discovery table includes global uncore information of the
die, e.g., the address of the global control register, the offset of
the global status register, the number of uncore units, the offset of
unit discovery tables, etc.
- The unit discovery table includes generic uncore unit information,
e.g., the access type, the counter width, the address of counters,
the address of the counter control, the unit ID, the unit type, etc.
The unit is also called "box" in the code.
Perf can provide basic uncore support based on this information
with the following patches.

To locate the PCI device with the discovery tables, check the generic
PCI ID first. If it doesn't match, go through the entire PCI device tree
and locate the device with the unique capability ID.

The uncore information is similar among dies. To save parsing time and
space, only completely parse and store the discovery tables on the first
die and the first box of each die. The parsed information is stored in
an
RB tree structure, intel_uncore_discovery_type. The size of the stored
discovery tables varies among platforms. It's around 4KB for a Sapphire
Rapids server.

If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
will exit with -ENODEV. There is nothing changed.

Add a module parameter to disable the discovery feature. If a BIOS gets
the discovery tables wrong, users can have an option to disable the
feature. For the current patchset, the uncore driver will exit with
-ENODEV. In the future, it may fall back to the hardcode uncore driver
on a known platform.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.liang@linux.intel.com

edae1f06

13 3月, 2021 4 次提交

KVM: LAPIC: Advancing the timer expiration on guest initiated write · 35737d2d

由 Wanpeng Li 提交于 3月 04, 2021

Advancing the timer expiration should only be necessary on guest initiated
writes. When we cancel the timer and clear .pending during state restore,
clear expired_tscdeadline as well.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35737d2d

KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af

由 Sean Christopherson 提交于 3月 09, 2021

If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
removal.  The concurrent write protections provided by REMOVED_SPTE are
not needed, there are no backing page side effects to record, and MMIO
SPTEs can be left as is since they are protected by the memslot
generation, not by ensuring that the MMIO SPTE is unreachable (which
is racy with respect to lockless walks regardless of zapping behavior).

Skipping !PRESENT drastically reduces the number of updates needed to
tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
be skipped.

Avoiding the write itself is likely close to a wash, but avoiding
__handle_changed_spte() is a clear-cut win as that involves saving and
restoring all non-volatile GPRs (it's a subtly big function), as well as
several conditional branches before bailing out.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210310003029.1250571-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8df9f1af

KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged · d7eb79c6

由 Wanpeng Li 提交于 2月 24, 2021

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                88
On-line CPU(s) list:   0-63
Off-line CPU(s) list:  64-87

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro
rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall

# echo 1 > /sys/devices/system/cpu/cpu76/online
-bash: echo: write error: Cannot allocate memory

The per-cpu vsyscall pvclock data pointer assigns either an element of the
static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory
hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if
kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in
kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall
and sometimes you end up with vsyscall disabled if the host does something
strange. This patch fixes it by allocating this dynamically memory
unconditionally even if vsyscall is disabled.

Fixes: 6a1cac56 ("x86/kvm: Use __bss_decrypted attribute in shared variables")
Reported-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org#v4.19-rc5+
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d7eb79c6

kvm: x86: annotate RCU pointers · 6fcd9cbc

由 Muhammad Usama Anjum 提交于 3月 06, 2021

This patch adds the annotation to fix the following sparse errors:
arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map *
arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map *
arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter *
arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map *
arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map *
arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter *
Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305191123.GA497469@LEGION>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6fcd9cbc

12 3月, 2021 1 次提交

objtool,x86: Fix uaccess PUSHF/POPF validation · ba08abca

由 Peter Zijlstra 提交于 3月 08, 2021

Commit ab234a26 ("x86/pv: Rework arch_local_irq_restore() to not
use popf") replaced "push %reg; popf" with something like: "test
$0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry
that commit ea24213d ("objtool: Add UACCESS validation") relies
on.

The result is:

  drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted

Meanwhile, commit c9c324dc ("objtool: Support stack layout changes
in alternatives") makes that we can actually use stack-ops in
alternatives, which means we can revert 1ff865e3 ("x86,smap: Fix
smap_{save,restore}() alternatives").

That in turn means we can limit the PUSHF/POPF handling of
ea24213d to those instructions that are in alternatives.

Fixes: ab234a26 ("x86/pv: Rework arch_local_irq_restore() to not use popf")
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net

ba08abca

11 3月, 2021 2 次提交

Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF} · bce21a2b

由 Jan Beulich 提交于 3月 10, 2021

It's not helpful if every driver has to cook its own. Generalize
xenbus'es INVALID_GRANT_HANDLE and pcifront's INVALID_GRANT_REF (which
shouldn't have expanded to zero to begin with). Use the constants in
p2m.c and gntdev.c right away, and update field types where necessary so
they would match with the constants' types (albeit without touching
struct ioctl_gntdev_grant_ref's ref field, as that's part of the public
interface of the kernel and would require introducing a dependency on
Xen's grant_table.h public header).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/db7c38a5-0d75-d5d1-19de-e5fe9f0b9c48@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

bce21a2b

Xen: drop exports of {set,clear}_foreign_p2m_mapping() · 0f9b05b9

由 Jan Beulich 提交于 3月 09, 2021

They're only used internally, and the layering violation they contain
(x86) or imply (Arm) of calling HYPERVISOR_grant_table_op() strongly
advise against any (uncontrolled) use from a module. The functions also
never had users except the ones from drivers/xen/grant-table.c forever
since their introduction in 3.15.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/746a5cd6-1446-eda4-8b23-03c1cac30b8d@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

0f9b05b9

10 3月, 2021 1 次提交

x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case · c8e2fe13

由 Sean Christopherson 提交于 3月 09, 2021

Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case.  Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop.  Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com

c8e2fe13

09 3月, 2021 3 次提交

x86/sev-es: Use __copy_from_user_inatomic() · bffe30dd

由 Joerg Roedel 提交于 3月 03, 2021

The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7 ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org

bffe30dd

x86/sev-es: Correctly track IRQ states in runtime #VC handler · 62441a1f

由 Joerg Roedel 提交于 3月 03, 2021

Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to
correctly track the IRQ state during its execution.

Fixes: 0786138c ("x86/sev-es: Add a Runtime #VC Exception Handler")
Reported-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-5-joro@8bytes.org

62441a1f

x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack · 545ac14c

由 Joerg Roedel 提交于 3月 03, 2021

The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.

But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the check
to not use regs->sp when the NMI interrupted user-space code or the
SYSCALL gap.

Fixes: 315562c9 ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Reported-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-3-joro@8bytes.org

545ac14c

08 3月, 2021 1 次提交

x86/sev-es: Introduce ip_within_syscall_gap() helper · 78a81d88

由 Joerg Roedel 提交于 3月 03, 2021

Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9 ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org

78a81d88

06 3月, 2021 3 次提交

x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls · 5d5675df

由 Andy Lutomirski 提交于 3月 04, 2021

On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work.  This confuses audit and ptrace.  For example:

    $ ./tools/testing/selftests/x86/syscall_arg_fault_32
    ...
    strace: pid 264258: entering, ptrace_syscall_info.op == 2
    ...

This is a minimal fix intended for ease of backporting.  A more
complete cleanup is coming.

Fixes: 0b085e68 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org

5d5675df

x86/unwind/orc: Silence warnings caused by missing ORC data · b59cc976

由 Josh Poimboeuf 提交于 2月 05, 2021

The ORC unwinder attempts to fall back to frame pointers when ORC data
is missing for a given instruction. It sets state->error, but then
tries to keep going as a best-effort type of thing. That may result in
further warnings if the unwinder gets lost.

Until we have some way to register generated code with the unwinder,
missing ORC will be expected, and occasionally going off the rails will
also be expected. So don't warn about it.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NIvan Babrou <ivan@cloudflare.com>
Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com

b59cc976

x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2 · e504e74c

由 Josh Poimboeuf 提交于 2月 05, 2021

KASAN reserves "redzone" areas between stack frames in order to detect
stack overruns.  A read or write to such an area triggers a KASAN
"stack-out-of-bounds" BUG.

Normally, the ORC unwinder stays in-bounds and doesn't access the
redzone.  But sometimes it can't find ORC metadata for a given
instruction.  This can happen for code which is missing ORC metadata, or
for generated code.  In such cases, the unwinder attempts to fall back
to frame pointers, as a best-effort type thing.

This fallback often works, but when it doesn't, the unwinder can get
confused and go off into the weeds into the KASAN redzone, triggering
the aforementioned KASAN BUG.

But in this case, the unwinder's confusion is actually harmless and
working as designed.  It already has checks in place to prevent
off-stack accesses, but those checks get short-circuited by the KASAN
BUG.  And a BUG is a lot more disruptive than a harmless unwinder
warning.

Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack
accesses.  This finishes the job started by commit 881125bf
("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only
partially fixed the issue.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Reported-by: NIvan Babrou <ivan@cloudflare.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: NIvan Babrou <ivan@cloudflare.com>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com

e504e74c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功