提交 · c76826a65f50038f050424365dbf3f97203f8710 · openeuler / Kernel

02 7月, 2021 15 次提交

perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server · c76826a6

由 Kan Liang 提交于 6月 30, 2021

Several free-running counters for IMC uncore blocks are supported on
Sapphire Rapids server.

They are not enumerated in the discovery tables. The number of the
free-running counter boxes is calculated from the number of
corresponding standard boxes.

The snbep_pci2phy_map_init() is invoked to setup the mapping from a PCI
BUS to a Die ID, which is used to locate the corresponding MC device of
a IMC uncore unit in the spr_uncore_imc_freerunning_init_box().
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-16-git-send-email-kan.liang@linux.intel.com

c76826a6

perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server · 0378c93a

由 Kan Liang 提交于 6月 30, 2021

Several free-running counters for IIO uncore blocks are supported on
Sapphire Rapids server.

They are not enumerated in the discovery tables. Extend
generic_init_uncores() to support extra uncore types. The uncore types
for the free-running counters is inserted right after the uncore types
retrieved from the discovery table.

The number of the free-running counter boxes is calculated from the max
number of the corresponding standard boxes.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-15-git-send-email-kan.liang@linux.intel.com

0378c93a

perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() · 1583971b

由 Kan Liang 提交于 6月 30, 2021

The IMC free-running counters on Sapphire Rapids server are also
accessed by MMIO, which is similar to the previous platforms, SNR and
ICX. The only difference is the device ID of the device which contains
BAR address.

Factor out snr_uncore_mmio_map() which ioremap the MMIO space. It can be
reused in the following patch for SPR.

Use the snr_uncore_mmio_map() in the icx_uncore_imc_freerunning_init_box().
There is no box_ctl for the free-running counters.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-14-git-send-email-kan.liang@linux.intel.com

1583971b

perf/x86/intel/uncore: Add alias PMU name · 8053f2d7

由 Kan Liang 提交于 6月 30, 2021

A perf PMU may have two PMU names. For example, Intel Sapphire Rapids
server supports the discovery mechanism. Without the platform-specific
support, an uncore PMU is named by a type ID plus a box ID, e.g.,
uncore_type_0_0, because the real name of the uncore PMU cannot be
retrieved from the discovery table. With the platform-specific support
later, perf has the mapping information from a type ID to a specific
uncore unit. Just like the previous platforms, the uncore PMU is named
by the real PMU name, e.g., uncore_cha_0. The user scripts which work
well with the old numeric name may not work anymore.

Add a new attribute "alias" to indicate the old numeric name. The
following userspace perf tool patch will handle both names. The user
scripts should work properly with the updated perf tool.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-13-git-send-email-kan.liang@linux.intel.com

8053f2d7

perf/x86/intel/uncore: Add Sapphire Rapids server MDF support · 0d771caf

由 Kan Liang 提交于 6月 30, 2021

The MDF subsystem is a new IP built to support the new Intel Xeon
architecture that bridges multiple dies with a embedded bridge system.

The layout of the control registers for a MDF uncore unit is similar to
a IRP uncore unit.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-12-git-send-email-kan.liang@linux.intel.com

0d771caf

perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support · 2a8e51ea

由 Kan Liang 提交于 6月 30, 2021

M3 Intel UPI is the interface between the mesh and the Intel UPI link
layer. It is responsible for translating between the mesh protocol
packets and the flits that are used for transmitting data across the
Intel UPI interface.

The layout of the control registers for a M3UPI uncore unit is similar
to a UPI uncore unit.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-11-git-send-email-kan.liang@linux.intel.com

2a8e51ea

perf/x86/intel/uncore: Add Sapphire Rapids server UPI support · da5a9156

由 Kan Liang 提交于 6月 30, 2021

Sapphire Rapids uses a coherent interconnect for scaling to multiple
sockets known as Intel UPI. Intel UPI technology provides a cache
coherent socket to socket external communication interface between
processors.

The layout of the control registers for a UPI uncore unit is similar to
a M2M uncore unit.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-10-git-send-email-kan.liang@linux.intel.com

da5a9156

perf/x86/intel/uncore: Add Sapphire Rapids server M2M support · f57191ed

由 Kan Liang 提交于 6月 30, 2021

The M2M blocks manage the interface between the mesh (operating on both
the mesh and the SMI3 protocol) and the memory controllers.

The layout of the control registers for a M2M uncore unit is a little
bit different from the generic one. So a specific format and ops are
required. Expose the common PCI ops which can be reused.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-9-git-send-email-kan.liang@linux.intel.com

f57191ed

perf/x86/intel/uncore: Add Sapphire Rapids server IMC support · 85f2e30f

由 Kan Liang 提交于 6月 30, 2021

The Sapphire Rapids IMC provides the interface to the DRAM and
communicates to the rest of the uncore through the M2M block.

The layout of the control registers for a IMC uncore unit is a little
bit different from the generic one. There is a fixed counter for IMC.
So a specific format and ops are required. Expose the common MMIO ops
which can be reused.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-8-git-send-email-kan.liang@linux.intel.com

85f2e30f

perf/x86/intel/uncore: Add Sapphire Rapids server PCU support · 0654dfdc

由 Kan Liang 提交于 6月 30, 2021

The PCU is the primary power controller for the Sapphire Rapids.

Except the name, all the information can be retrieved from the discovery
tables.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-7-git-send-email-kan.liang@linux.intel.com

0654dfdc

perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support · f85ef898

由 Kan Liang 提交于 6月 30, 2021

M2PCIe* blocks manage the interface between the mesh and each IIO stack.

The layout of the control registers for a M2PCIe uncore unit is similar
to a IRP uncore unit.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-6-git-send-email-kan.liang@linux.intel.com

f85ef898

perf/x86/intel/uncore: Add Sapphire Rapids server IRP support · e199eb51

由 Kan Liang 提交于 6月 30, 2021

The IRP is responsible for maintaining coherency for the IIO traffic
targeting coherent memory.

The layout of the control registers for a IRP uncore unit is a little
bit different from the generic one.

Factor out SPR_UNCORE_COMMON_FORMAT, which can be reused by the
following uncore units.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-5-git-send-email-kan.liang@linux.intel.com

e199eb51

perf/x86/intel/uncore: Add Sapphire Rapids server IIO support · 3ba7095b

由 Kan Liang 提交于 6月 30, 2021

The IIO stacks are responsible for managing the traffic between the PCI
Express* (PCIe*) domain and the mesh domain. The IIO PMON block is
situated near the IIO stacks traffic controller capturing the traffic
controller as well as the PCIe* root port information.

The layout of the control registers for a IIO uncore unit is a little
bit different from the generic one.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-4-git-send-email-kan.liang@linux.intel.com

3ba7095b

perf/x86/intel/uncore: Add Sapphire Rapids server CHA support · 949b1138

由 Kan Liang 提交于 6月 30, 2021

CHA merges the caching agent and Home Agent (HA) responsibilities of the
chip into a single block. It's one of the Sapphire Rapids server uncore
units.

The layout of the control registers for a CHA uncore unit is a little
bit different from the generic one. The CHA uncore unit also supports a
filter register for TID. So a specific format and ops are required.
Expose the common MSR ops which can be reused.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-3-git-send-email-kan.liang@linux.intel.com

949b1138

perf/x86/intel/uncore: Add Sapphire Rapids server framework · c54c53d9

由 Kan Liang 提交于 6月 30, 2021

Intel Sapphire Rapids supports a discovery mechanism, that allows an
uncore driver to discover the different components ("boxes") of the
chip.

All the generic information of the uncore boxes should be retrieved from
the discovery tables. This has been enabled with the commit edae1f06
("perf/x86/intel/uncore: Parse uncore discovery tables"). Add
use_discovery to indicate the case. The uncore driver doesn't need to
hard code the generic information for each uncore box.
But we still need to enable various functionality that cannot be
directly discovered.

To support these functionalities, the Sapphire Rapids server framework
is introduced here. Each specific uncore unit will be added into the
framework in the following patches.

Add use_discovery to indicate that the discovery mechanism is required
for the platform. Currently, Intel Sapphire Rapids is one of the
platforms.

The box ID from the discovery table is the accurate index. Use it if
applicable.

All the undiscovered platform-specific features will be hard code in the
spr_uncores[]. Add uncore_type_customized_copy(), instead of the memcpy,
to only overwrite these features.

The specific uncore unit hasn't been added here. From user's
perspective, there is nothing changed for now.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1625087320-194204-2-git-send-email-kan.liang@linux.intel.com

c54c53d9

24 6月, 2021 5 次提交

perf: Fix task context PMU for Hetero · 012669c7

由 Peter Zijlstra 提交于 6月 22, 2021

On HETEROGENEOUS hardware (ARM big.Little, Intel Alderlake etc.) each
CPU might have a different hardware PMU. Since each such PMU is
represented by a different struct pmu, but we only have a single HW
task context.

That means that the task context needs to switch PMU type when it
switches CPUs.

Not doing this means that ctx->pmu calls (pmu_{dis,en}able(),
{start,commit,cancel}_txn() etc.) are called against the wrong PMU and
things will go wobbly.

Fixes: f83d2f91 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NKan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/YMsy7BuGT8nBTspT@hirez.programming.kicks-ass.net

012669c7

perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids · 1d5c7880

由 Kan Liang 提交于 6月 18, 2021

Perf errors out when sampling instructions:ppp.

$ perf record -e instructions:ppp -- true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (instructions:ppp).

The instruction PDIR is only available on the fixed counter 0. The event
constraint has been updated to fixed0_constraint in
icl_get_event_constraints(). The Sapphire Rapids codes unconditionally
error out for the event which is not available on the GP counter 0.

Make the instructions:ppp an exception.

Fixes: 61b985e3 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Reported-by: NYasin, Ahmad <ahmad.yasin@intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1624029174-122219-4-git-send-email-kan.liang@linux.intel.com

1d5c7880

perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids · d18216fa

由 Kan Liang 提交于 6月 18, 2021

On Sapphire Rapids, there are two more events 0x40ad and 0x04c2 which
rely on the FRONTEND MSR. If the FRONTEND MSR is not set correctly, the
count value is not correct.

Update intel_spr_extra_regs[] to support them.

Fixes: 61b985e3 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1624029174-122219-3-git-send-email-kan.liang@linux.intel.com

d18216fa

perf/x86/intel: Fix fixed counter check warning for some Alder Lake · ee72a94e

由 Kan Liang 提交于 6月 18, 2021

For some Alder Lake machine, the below fixed counter check warning may be
triggered.

[    2.010766] hw perf events fixed 5 > max(4), clipping!

Current perf unconditionally increases the number of the GP counters and
the fixed counters for a big core PMU on an Alder Lake system, because
the number enumerated in the CPUID only reflects the common counters.
The big core may has more counters. However, Alder Lake may have an
alternative configuration. With that configuration,
the X86_FEATURE_HYBRID_CPU is not set. The number of the GP counters and
fixed counters enumerated in the CPUID is accurate. Perf mistakenly
increases the number of counters. The warning is triggered.

Directly use the enumerated value on the system with the alternative
configuration.

Fixes: f83d2f91 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: NJin Yao <yao.jin@linux.intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1624029174-122219-2-git-send-email-kan.liang@linux.intel.com

ee72a94e

perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS · 4c58d922

由 Like Xu 提交于 6月 21, 2021

If we use the "PEBS-via-PT" feature on a platform that supports
extended PBES, like this:

    perf record -c 10000 \
    -e '{intel_pt/branch=0/,branch-instructions/aux-output/p}' uname

we will encounter the following call trace:

[  250.906542] unchecked MSR access error: WRMSR to 0x14e1 (tried to write
0x0000000000000000) at rIP: 0xffffffff88073624 (native_write_msr+0x4/0x20)
[  250.920779] Call Trace:
[  250.923508]  intel_pmu_pebs_enable+0x12c/0x190
[  250.928359]  intel_pmu_enable_event+0x346/0x390
[  250.933300]  x86_pmu_start+0x64/0x80
[  250.937231]  x86_pmu_enable+0x16a/0x2f0
[  250.941434]  perf_event_exec+0x144/0x4c0
[  250.945731]  begin_new_exec+0x650/0xbf0
[  250.949933]  load_elf_binary+0x13e/0x1700
[  250.954321]  ? lock_acquire+0xc2/0x390
[  250.958430]  ? bprm_execve+0x34f/0x8a0
[  250.962544]  ? lock_is_held_type+0xa7/0x120
[  250.967118]  ? find_held_lock+0x32/0x90
[  250.971321]  ? sched_clock_cpu+0xc/0xb0
[  250.975527]  bprm_execve+0x33d/0x8a0
[  250.979452]  do_execveat_common.isra.0+0x161/0x1d0
[  250.984673]  __x64_sys_execve+0x33/0x40
[  250.988877]  do_syscall_64+0x3d/0x80
[  250.992806]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  250.998302] RIP: 0033:0x7fbc971d82fb
[  251.002235] Code: Unable to access opcode bytes at RIP 0x7fbc971d82d1.
[  251.009303] RSP: 002b:00007fffb8aed808 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  251.017478] RAX: ffffffffffffffda RBX: 00007fffb8af2f00 RCX: 00007fbc971d82fb
[  251.025187] RDX: 00005574792aac50 RSI: 00007fffb8af2f00 RDI: 00007fffb8aed810
[  251.032901] RBP: 00007fffb8aed970 R08: 0000000000000020 R09: 00007fbc9725c8b0
[  251.040613] R10: 6d6c61632f6d6f63 R11: 0000000000000202 R12: 00005574792aac50
[  251.048327] R13: 00007fffb8af35f0 R14: 00005574792aafdf R15: 00005574792aafe7

This is because the target reload msr address is calculated
based on the wrong base msr and the target reload msr value
is accessed from ds->pebs_event_reset[] with the wrong offset.

According to Intel SDM Table 2-14, for extended PBES feature,
the reload msr for MSR_IA32_FIXED_CTRx should be based on
MSR_RELOAD_FIXED_CTRx.

For fixed counters, let's fix it by overriding the reload msr
address and its value, thus avoiding out-of-bounds access.

Fixes: 42880f72("perf/x86/intel: Support PEBS output to PT")
Signed-off-by: NLike Xu <likexu@tencent.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210621034710.31107-1-likexu@tencent.com

4c58d922

17 6月, 2021 1 次提交

perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task · 5471eea5

由 Kan Liang 提交于 6月 14, 2021

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

$./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Three methods were considered to address the issue.

- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
overhead even when there is no RDPMC task running.

- Only reset the un-assigned dirty counters when the RDPMC task is
scheduled in via sched_task(). It fails for the below case.

Thread A Thread B

clone(CLONE_THREAD) --->
set_affine(0)
set_affine(1)
while (!event-enabled)
;
event = perf_event_open()
mmap(event)
ioctl(event, IOC_ENABLE); --->
RDPMC

Counters are still leaked to the thread B.

- Only reset the un-assigned dirty counters before updating the CR4.PCE
bit. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The security issue can only be found with an RDPMC task. To enable the
RDMPC, the CR4.PCE bit has to be updated. Add a
perf_clear_dirty_counters() right before updating the CR4.PCE bit to
clear the existing dirty counters. Only the current un-assigned dirty
counters are reset, because the RDPMC assigned dirty counters will be
updated soon.

After applying the patch,

$ ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
uses one counter; while the other user uses all available counters.
When the RDPMC task is scheduled in, all the counters, other than the
RDPMC assigned one, have to be reset.

Test results for the worst-case, using a modified lat_ctx as measured
on an Ice Lake platform, which has 8 GP and 3 FP counters (ignoring
SLOTS).

lat_ctx -s 128K -N 1000 processes 2

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1623693582-187370-1-git-send-email-kan.liang@linux.intel.com

5471eea5

03 6月, 2021 1 次提交

kprobes: Do not increment probe miss count in the fault handler · 2e38eb04

由 Naveen N. Rao 提交于 6月 01, 2021

Kprobes has a counter 'nmissed', that is used to count the number of
times a probe handler was not called. This generally happens when we hit
a kprobe while handling another kprobe.

However, if one of the probe handlers causes a fault, we are currently
incrementing 'nmissed'. The comment in fault handler indicates that this
can be used to account faults taken by the probe handlers. But, this has
never been the intention as is evident from the comment above 'nmissed'
in 'struct kprobe':

	/*count the number of times this probe was temporarily disarmed */
	unsigned long nmissed;
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com

2e38eb04

01 6月, 2021 3 次提交

x86,kprobes: WARN if kprobes tries to handle a fault · 00afe830

由 Peter Zijlstra 提交于 5月 25, 2021

With the removal of kprobe::handle_fault there is no reason left that
kprobe_page_fault() would ever return true on x86, make sure it
doesn't happen by accident.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.660594073@infradead.org

00afe830

kprobes: Remove kprobe::fault_handler · ec6aba3d

由 Peter Zijlstra 提交于 5月 25, 2021

The reason for kprobe::fault_handler(), as given by their comment:

 * We come here because instructions in the pre/post
 * handler caused the page_fault, this could happen
 * if handler tries to access user space by
 * copy_from_user(), get_user() etc. Let the
 * user-specified handler try to fix it first.

Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org

ec6aba3d

uprobes: Update uprobe_write_opcode() kernel-doc comment · 9ce4d216

由 Qiujun Huang 提交于 5月 24, 2021

commit 6d43743e ("Uprobe: Additional argument arch_uprobe to
uprobe_write_opcode()") added the parameter @auprobe.
Signed-off-by: NQiujun Huang <hqjagain@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210524041411.157027-1-hqjagain@gmail.com

9ce4d216

27 5月, 2021 3 次提交

perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint · 875dd7bf

由 Haocheng Xie 提交于 5月 27, 2021

Fix the following W=1 kernel build warning(s):

kernel/events/hw_breakpoint.c:461: warning: Function parameter or member 'context' not described in 'register_user_hw_breakpoint'
kernel/events/hw_breakpoint.c:560: warning: Function parameter or member 'context' not described in 'register_wide_hw_breakpoint'
Signed-off-by: NHaocheng Xie <xiehaocheng.cn@gmail.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210527031947.1801-4-xiehaocheng.cn@gmail.com

875dd7bf

perf/core: Fix DocBook warnings · a1ddf524

由 Haocheng Xie 提交于 5月 27, 2021

Fix the following W=1 kernel build warning(s):

kernel/events/core.c:143: warning: Function parameter or member 'cpu' not described in 'cpu_function_call'
kernel/events/core.c:11924: warning: Function parameter or member 'flags' not described in 'sys_perf_event_open'
kernel/events/core.c:12382: warning: Function parameter or member 'overflow_handler' not described in 'perf_event_create_kernel_counter'
kernel/events/core.c:12382: warning: Function parameter or member 'context' not described in 'perf_event_create_kernel_counter'
Signed-off-by: NHaocheng Xie <xiehaocheng.cn@gmail.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210527031947.1801-3-xiehaocheng.cn@gmail.com

a1ddf524

perf/core: Make local function perf_pmu_snapshot_aux() static · 32961aec

由 Haocheng Xie 提交于 5月 27, 2021

Fixes the following W=1 kernel build warning:

kernel/events/core.c:6670:6: warning: no previous prototype for 'perf_pmu_snapshot_aux' [-Wmissing-prototypes]
Signed-off-by: NHaocheng Xie <xiehaocheng.cn@gmail.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210527031947.1801-2-xiehaocheng.cn@gmail.com

32961aec

18 5月, 2021 3 次提交

perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX · 10337e95

由 Alexander Antonov 提交于 4月 26, 2021

This patch enables I/O stacks to IIO PMON mapping on Icelake server.

Mapping of IDs in SAD_CONTROL_CFG notation to IDs in PMON notation for
Icelake server:

Stack Name         | CBDMA/DMI | PCIe_1 | PCIe_2 | PCIe_3 | PCIe_4 | PCIe_5
SAD_CONTROL_CFG ID |     0     |    1   |    2   |    3   |    4   |    5
PMON ID            |     5     |    0   |    1   |    2   |    3   |    4

I/O stacks to IIO PMON mapping is exposed through attributes
/sys/devices/uncore_iio_<pmu_idx>/dieX, where dieX is file which holds
"Segment:Root Bus" for PCIe root port which can be monitored by that
IIO PMON block. Example for 2-S Icelake server:

==> /sys/devices/uncore_iio_0/die0 <==
0000:16
==> /sys/devices/uncore_iio_0/die1 <==
0000:97
==> /sys/devices/uncore_iio_1/die0 <==
0000:30
==> /sys/devices/uncore_iio_1/die1 <==
0000:b0
==> /sys/devices/uncore_iio_3/die0 <==
0000:4a
==> /sys/devices/uncore_iio_3/die1 <==
0000:c9
==> /sys/devices/uncore_iio_4/die0 <==
0000:64
==> /sys/devices/uncore_iio_4/die1 <==
0000:e2
==> /sys/devices/uncore_iio_5/die0 <==
0000:00
==> /sys/devices/uncore_iio_5/die1 <==
0000:80
Signed-off-by: NAlexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20210426131614.16205-4-alexander.antonov@linux.intel.com

10337e95

perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR · c1777be3

由 Alexander Antonov 提交于 4月 26, 2021

I/O stacks to PMON mapping on Skylake server relies on topology information
from CPU_BUS_NO MSR but this approach is not applicable for SNR and ICX.
Mapping on these platforms can be gotten by reading SAD_CONTROL_CFG CSR
from Mesh2IIO device with 0x09a2 DID.
SAD_CONTROL_CFG CSR contains stack IDs in its own notation which are
statically mapped on IDs in PMON notation.

The map for Snowridge:

Stack Name         | CBDMA/DMI | PCIe Gen 3 | DLB | NIS | QAT
SAD_CONTROL_CFG ID |     0     |      1     |  2  |  3  |  4
PMON ID            |     1     |      4     |  3  |  2  |  0

This patch enables I/O stacks to IIO PMON mapping on Snowridge.
Mapping is exposed through attributes /sys/devices/uncore_iio_<pmu_idx>/dieX,
where dieX is file which holds "Segment:Root Bus" for PCIe root port which
can be monitored by that IIO PMON block. Example for Snowridge:

==> /sys/devices/uncore_iio_0/die0 <==
0000:f3
==> /sys/devices/uncore_iio_1/die0 <==
0000:00
==> /sys/devices/uncore_iio_2/die0 <==
0000:eb
==> /sys/devices/uncore_iio_3/die0 <==
0000:e3
==> /sys/devices/uncore_iio_4/die0 <==
0000:14

Mapping for Icelake server will be enabled in the follow-up patch.
Signed-off-by: NAlexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20210426131614.16205-3-alexander.antonov@linux.intel.com

c1777be3

perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure · f471fac7

由 Alexander Antonov 提交于 4月 26, 2021

Currently I/O stacks to IIO PMON mapping is available on Skylake servers
only and need to make code more general to easily enable further platforms.
So, introduce get_topology() callback in struct intel_uncore_type which
allows to move common code to separate function and make mapping procedure
more general.
Signed-off-by: NAlexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20210426131614.16205-2-alexander.antonov@linux.intel.com

f471fac7

12 5月, 2021 2 次提交

perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of() · 440e9067

由 Guenter Roeck 提交于 5月 10, 2021

The parameter passed to the pmu_enable() and pmu_disable() functions can not be
NULL because it is dereferenced by the caller.

That means the result of container_of() on that parameter can also never be NULL.
The existing NULL checks are therefore unnecessary and misleading. Remove them.

This change was made automatically with the following Coccinelle script.

  @@
  type t;
  identifier v;
  statement s;
  @@

  <+...
  (
    t v = container_of(...);
  |
    v = container_of(...);
  )
    ...
    when != v
  - if (\( !v \| v == NULL \) ) s
  ...+>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210510224849.2349861-1-linux@roeck-us.net

440e9067

Merge tag 'for-5.13-rc1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 88b06399

由 Linus Torvalds 提交于 5月 11, 2021

Pull btrfs fix from David Sterba:
 "Handle transaction start error in btrfs_fileattr_set()

  This is fix for code introduced by the new fileattr merge"

* tag 'for-5.13-rc1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: handle transaction start error in btrfs_fileattr_set

88b06399

11 5月, 2021 4 次提交

btrfs: handle transaction start error in btrfs_fileattr_set · 9b8a233b

由 Ritesh Harjani 提交于 4月 30, 2021

Add error handling in btrfs_fileattr_set in case of an error while
starting a transaction. This fixes btrfs/232 which otherwise used to
fail with below signature on Power.

  btrfs/232 [ 1119.474650] run fstests btrfs/232 at 2021-04-21 02:21:22
  <...>
  [ 1366.638585] BUG: Unable to handle kernel data access on read at 0xffffffffffffff86
  [ 1366.638768] Faulting instruction address: 0xc0000000009a5c88
  cpu 0x0: Vector: 380 (Data SLB Access) at [c000000014f177b0]
      pc: c0000000009a5c88: btrfs_update_root_times+0x58/0xc0
      lr: c0000000009a5c84: btrfs_update_root_times+0x54/0xc0
      <...>
      pid   = 24881, comm = fsstress
	   btrfs_update_inode+0xa0/0x140
	   btrfs_fileattr_set+0x5d0/0x6f0
	   vfs_fileattr_set+0x2a8/0x390
	   do_vfs_ioctl+0x1290/0x1ac0
	   sys_ioctl+0x6c/0x120
	   system_call_exception+0x3d4/0x410
	   system_call_common+0xec/0x278

Fixes: 97fc2977 ("btrfs: convert to fileattr")
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9b8a233b

Merge tag 'perf-tools-fixes-for-v5.13-2021-05-10' of... · 1140ab59

由 Linus Torvalds 提交于 5月 10, 2021

Merge tag 'perf-tools-fixes-for-v5.13-2021-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix swapping of cpu_map and stat_config records.

 - Fix dynamic libbpf linking.

 - Disallow -c and -F option at the same time in 'perf record'.

 - Update headers with the kernel originals.

 - Silence warning for JSON ArchStd files.

 - Fix a build error on arm64 with clang.

* tag 'perf-tools-fixes-for-v5.13-2021-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  tools headers UAPI: Sync perf_event.h with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools include UAPI powerpc: Sync errno.h with the kernel headers
  tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools headers UAPI: Sync files changed by landlock, quotactl_path and mount_settattr new syscalls
  perf tools: Fix a build error on arm64 with clang
  tools headers kvm: Sync kvm headers with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  perf tools: Fix dynamic libbpf link
  perf session: Fix swapping of cpu_map and stat_config records
  perf jevents: Silence warning for ArchStd files
  perf record: Disallow -c and -F option at the same time
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h headers

1140ab59

Merge tag 'for-5.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 142b507f

由 Linus Torvalds 提交于 5月 10, 2021

Pull btrfs fixes from David Sterba:
 "First batch of various fixes, here's a list of notable ones:

   - fix unmountable seed device after fstrim

   - fix silent data loss in zoned mode due to ordered extent splitting

   - fix race leading to unpersisted data and metadata on fsync

   - fix deadlock when cloning inline extents and using qgroups"

* tag 'for-5.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: initialize return variable in cleanup_free_space_cache_v1
  btrfs: zoned: sanity check zone type
  btrfs: fix unmountable seed device after fstrim
  btrfs: fix deadlock when cloning inline extents and using qgroups
  btrfs: fix race leading to unpersisted data and metadata on fsync
  btrfs: do not consider send context as valid when trying to flush qgroups
  btrfs: zoned: fix silent data loss after failure splitting ordered extent

142b507f

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 0aa099a3

由 Linus Torvalds 提交于 5月 10, 2021

Pull kvm fixes from Paolo Bonzini:

 - Lots of bug fixes.

 - Fix virtualization of RDPID

 - Virtualization of DR6_BUS_LOCK, which on bare metal is new to this
   release

 - More nested virtualization migration fixes (nSVM and eVMCS)

 - Fix for KVM guest hibernation

 - Fix for warning in SEV-ES SRCU usage

 - Block KVM from loading on AMD machines with 5-level page tables, due
   to the APM not mentioning how host CR4.LA57 exactly impacts the
   guest.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (48 commits)
  KVM: SVM: Move GHCB unmapping to fix RCU warning
  KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers
  kvm: Cap halt polling at kvm->max_halt_poll_ns
  tools/kvm_stat: Fix documentation typo
  KVM: x86: Prevent deadlock against tk_core.seq
  KVM: x86: Cancel pvclock_gtod_work on module removal
  KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging
  KVM: X86: Expose bus lock debug exception to guest
  KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit
  KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
  KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed
  KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model
  KVM: x86: Move uret MSR slot management to common x86
  KVM: x86: Export the number of uret MSRs to vendor modules
  KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way
  KVM: VMX: Use common x86's uret MSR list as the one true list
  KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list
  KVM: VMX: Configure list of user return MSRs at module init
  KVM: x86: Add support for RDPID without RDTSCP
  KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host
  ...

0aa099a3

10 5月, 2021 3 次提交

tools headers UAPI: Sync perf_event.h with the kernel sources · 71d7924b

由 Arnaldo Carvalho de Melo 提交于 5月 09, 2021

To pick up the changes in:

  2b26f0aa ("perf: Support only inheriting events if cloned with CLONE_THREAD")
  2e498d0a ("perf: Add support for event removal on exec")
  547b6098 ("perf: aux: Add flags for the buffer format")
  55bcf6ef ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
  7dde5176 ("perf: aux: Add CoreSight PMU buffer formats")
  97ba62b2 ("perf: Add support for SIGTRAP on perf events")
  d0d1dd62 ("perf core: Add PERF_COUNT_SW_CGROUP_SWITCHES event")

Also change the expected sizeof(struct perf_event_attr) from 120 to 128 due to
fields being added for the SIGTRAP changes.

Addressing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

71d7924b

tools headers cpufeatures: Sync with the kernel sources · 6faf64f5

由 Arnaldo Carvalho de Melo 提交于 5月 09, 2021

To pick the changes from:

  4e629211 ("x86/paravirt: Add new features for paravirt patching")
  a161545a ("x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit")
  a89dfde3 ("x86: Remove dynamic NOP selection")
  b8921dcc ("x86/cpufeatures: Add SGX1 and SGX2 sub-features")
  f21d4d3b ("x86/cpufeatures: Enumerate #DB for bus lock detection")
  f333374e ("x86/cpufeatures: Add the Virtual SPEC_CTRL feature")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Babu Moger <babu.moger@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

6faf64f5

tools include UAPI powerpc: Sync errno.h with the kernel headers · 39163293

由 Arnaldo Carvalho de Melo 提交于 5月 09, 2021

To pick the change in:

7de21e67 ("powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h")

That will make the errno number -> string tables to pick this change on powerpc.

Silencing this perf build warning:

Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/errno.h' differs from latest version at 'arch/powerpc/include/uapi/asm/errno.h'
diff -u tools/arch/powerpc/include/uapi/asm/errno.h arch/powerpc/include/uapi/asm/errno.h

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

39163293

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功