提交 · 74528edfbc664f9d2c927c4e5a44f1285598ed0f · openeuler / Kernel

21 1月, 2023 1 次提交

intel_idle: add Emerald Rapids Xeon support · 74528edf

由 Artem Bityutskiy 提交于 1月 20, 2023

Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire
Rapids (SPR).

EMR C-states are the same as SPR C-states, and we expect that EMR
C-state characteristics (latency and target residency) will be the
same as in SPR. Therefore, add EMR support by using SPR C-states table.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

74528edf

22 9月, 2022 1 次提交

intel_idle: Add AlderLake-N support · 65c0c236

由 Zhang Rui 提交于 9月 20, 2022

Similar to the other other AlderLake platforms, the C1 and C1E states on
ADL-N are mutually exclusive. Only one of them can be enabled at a time.

C1E is preferred on ADL-N for better energy efficiency.

C6S is also supported on this platform. Its latency is far bigger than
C6, but really close to C8 (PC8), thus it is not exposed as a separate
state.
Suggested-by: NBaieswara Reddy Sagili <baieswara.reddy.sagili@intel.com>
Suggested-by: NVinay Kumar <vinay.kumar@intel.com>
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

65c0c236

01 9月, 2022 1 次提交

intel_idle: move from strlcpy() with unused retval to strscpy() · 0dbc0f49

由 Wolfram Sang 提交于 8月 18, 2022

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0dbc0f49

26 7月, 2022 2 次提交

intel_idle: make SPR C1 and C1E be independent · 1548fac4

由 Artem Bityutskiy 提交于 7月 16, 2022

This patch partially reverts the changes made by the following commit:

da0e58c0 intel_idle: add 'preferred_cstates' module argument

As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and
C1E states were mutually exclusive, so that users could only have either C1 and
C6, or C1E and C6.

However, Intel firmware engineers managed to remove this limitation and make C1
and C1E to be completely independent, just like on previous Xeon platforms.

Therefore, this patch:
 * Removes commentary describing the old, and now non-existing SPR C1E
   limitation.
 * Marks SPR C1E as available by default.
 * Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and
   C1E will be available regardless of 'preferred_cstates' value.

We expect that all SPR systems are shipping with new firmware, which includes
the C1/C1E improvement.

Cc: v5.18+ <stable@vger.kernel.org> # v5.18+
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1548fac4

intel_idle: Fix false positive RCU splats due to incorrect hardirqs state · d295ad34

由 Waiman Long 提交于 7月 23, 2022

Commit 32d4fd57 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
uses raw_local_irq_enable/local_irq_disable() around call to
__intel_idle() in intel_idle_irq().

With interrupt enabled, timer tick interrupt can happen and a
subsequently call to __do_softirq() may change the lockdep hardirqs state
of a debug kernel back to 'on'. This will result in a mismatch between
the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing
a number of false positive "WARNING: suspicious RCU usage" splats.

Fix that by using local_irq_disable() to disable interrupt in
intel_idle_irq().

Fixes: 32d4fd57 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
Signed-off-by: NWaiman Long <longman@redhat.com>
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d295ad34

20 7月, 2022 1 次提交

intel_idle: Add a new flag to initialize the AMX state · 9f011293

由 Chang S. Bae 提交于 7月 18, 2022

The non-initialized AMX state can be the cause of C-state demotion from C6
to C1E. This low-power idle state may improve power savings and thus result
in a higher available turbo frequency budget.

This behavior is implementation-specific. Initialize the state for the C6
entrance of Sapphire Rapids as needed.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NZhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/20220614164116.5196-1-chang.seok.bae@intel.com

9f011293

27 6月, 2022 1 次提交

intel_idle: Disable IBRS during long idle · bf5835bc

由 Peter Zijlstra 提交于 6月 14, 2022

Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.

Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.

Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.

Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.
Suggested-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

bf5835bc

09 6月, 2022 1 次提交

cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE · 32d4fd57

由 Peter Zijlstra 提交于 6月 08, 2022

Commit c227233a ("intel_idle: enable interrupts before C1 on
Xeons") wrecked intel_idle in two ways:

 - must not have tracing in idle functions
 - must return with IRQs disabled

Additionally, it added a branch for no good reason.

Fixes: c227233a ("intel_idle: enable interrupts before C1 on Xeons")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
[ rjw: Moved the intel_idle() kerneldoc comment next to the function ]
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

32d4fd57

28 4月, 2022 3 次提交

intel_idle: Add AlderLake support · d1cf8bbf

由 Zhang Rui 提交于 4月 15, 2022

Similar to SPR, the C1 and C1E states on ADL are mutually exclusive.
Only one of them can be enabled at a time.

But contrast to SPR, which usually has a strong latency requirement
as a Xeon processor, C1E is preferred on ADL for better energy
efficiency.

Add custom C-state tables for ADL with both C1 and C1E, and

 1. Enable the "C1E promotion" bit in MSR_IA32_POWER_CTL and mark C1
    with the CPUIDLE_FLAG_UNUSABLE flag, so C1 is not available by
    default.

 2. Add support for the "preferred_cstates" module parameter, so that
    users can choose to use C1 instead of C1E by booting with
    "intel_idle.preferred_cstates=2".

Separate custom C-state tables are introduced for the ADL mobile and
desktop processors, because of the exit latency differences between
these two variants, especially with respect to PC10.
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits, code rearrangement ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d1cf8bbf

intel_idle: Fix SPR C6 optimization · 7eac3bd3

由 Artem Bityutskiy 提交于 4月 27, 2022

The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.

This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.

This patch fixes the issue by eliminating the problematic 'return'
statement.

Fixes: 3a9cf77b ("intel_idle: add core C6 optimization for SPR")
Suggested-by: NJan Beulich <jbeulich@suse.com>
Reported-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7eac3bd3

intel_idle: Fix the 'preferred_cstates' module parameter · 39c184a6

由 Artem Bityutskiy 提交于 4月 27, 2022

Problem description.

When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs.  However, we enable it only on one CPU.

Fix description.

The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.

Tested on a 2-socket SPR system. I verified the following combinations:

 * C1E promotion enabled and disabled in BIOS.
 * Booted with and without the 'intel_idle.preferred_cstates=4' kernel
   argument.

In all 4 cases C1E promotion was correctly set on all CPUs.

Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).

Fixes: da0e58c0 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

39c184a6

17 3月, 2022 2 次提交

cpuidle: intel_idle: Drop redundant backslash at line end · 03eb6522

由 Rafael J. Wysocki 提交于 3月 15, 2022

Drop a redundant backslash character at the end of a line in the
spr_cstates[] definition.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

03eb6522

cpuidle: intel_idle: Update intel_idle() kerneldoc comment · a335b1e6

由 Rafael J. Wysocki 提交于 3月 15, 2022

Commit bf9282dc ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
moved the leave_mm() call away from intel_idle(), but it didn't update
its kerneldoc comment accordingly, so do that now.

Fixes: bf9282dc ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a335b1e6

05 3月, 2022 3 次提交

intel_idle: add core C6 optimization for SPR · 3a9cf77b

由 Artem Bityutskiy 提交于 3月 02, 2022

Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake
Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to
match core C6 values, instead of using the default package C6 values.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3a9cf77b

intel_idle: add 'preferred_cstates' module argument · da0e58c0

由 Artem Bityutskiy 提交于 3月 02, 2022

On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.

This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.

1. This option has effect only for "mutually exclusive" C-states like C1 and
C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
reasonable default, such as enabling C1 on SPR by default. On other
platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
other mutually exclusive C-states, if they come in the future.

Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

da0e58c0

intel_idle: add SPR support · 9edf3c0f

由 Artem Bityutskiy 提交于 3月 02, 2022

Add Sapphire Rapids Xeon support.

Up until very recently, the C1 and C1E C-states were independent, but this
has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these
chips the C1 and C1E states cannot be enabled at the same time. The "C1E
promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit.

Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR.

1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1  C-state.
   b. C1E requests end up with C1E C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon.
1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1 C-state.
   b. C1E requests end up with C1 C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was
exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be
enabled at the same time.

This patch adds both C1 and C1E states. However, C1E is marked as with the
"CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by
default. The C1E promotion bit will be cleared, which means that by default
only C1 and C6 will be registered on SPR.

The next patch will add an option for enabling C1E and disabling C1 on SPR.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9edf3c0f

25 9月, 2021 1 次提交

intel_idle: enable interrupts before C1 on Xeons · c227233a

由 Artem Bityutskiy 提交于 9月 17, 2021

Enable local interrupts before requesting C1 on the last two generations
of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake.
This decreases average C1 interrupt latency by about 5-10%, as measured
with the 'wult' tool.

The '->enter()' function of the driver enters C-states with local
interrupts disabled by executing the 'monitor' and 'mwait' pair of
instructions. If an interrupt happens, the CPU exits the C-state and
continues executing instructions after 'mwait'. It does not jump to
the interrupt handler, because local interrupts are disabled. The
cpuidle subsystem enables interrupts a bit later, after doing some
housekeeping.

With this patch, we enable local interrupts before requesting C1. In
this case, if the CPU wakes up because of an interrupt, it will jump
to the interrupt handler right away. The cpuidle housekeeping will be
done after the pending interrupt(s) are handled.

Enabling interrupts before entering a C-state has measurable impact
for faster C-states, like C1. Deeper, but slower C-states like C6 do
not really benefit from this sort of change, because their latency is
a lot higher comparing to the delay added by cpuidle housekeeping.

This change was also tested with cyclictest and dbench. In case of Ice
Lake, the average cyclictest latency decreased by 5.1%, and the average
'dbench' throughput increased by about 0.8%. Both tests were run for 4
hours with only C1 enabled (all other idle states, including 'POLL',
were disabled). CPU frequency was pinned to HFM, and uncore frequency
was pinned to the maximum value. The other platforms had similar
single-digit percentage improvements.

It is worth noting that this patch affects 'cpuidle' statistics a tiny
bit. Before this patch, C1 residency did not include the interrupt
handling time, but with this patch, it will include it. This is similar
to what happens in case of the 'POLL' state, which also runs with
interrupts enabled.
Suggested-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c227233a

09 6月, 2021 1 次提交

intel_idle: Adjust the SKX C6 parameters if PC6 is disabled · 64233338

由 Chen Yu 提交于 5月 28, 2021

Because cpuidle assumes worst-case C-state parameters, PC6 parameters
are used for describing C6, which is worst-case for requesting CC6.
When PC6 is enabled, this is appropriate. But if PC6 is disabled
in the BIOS, the exit latency and target residency should be adjusted
accordingly.

Exit latency:
Previously the C6 exit latency was measured as the PC6 exit latency.
With PC6 disabled, the C6 exit latency should be the one of CC6.

Target residency:
With PC6 disabled, the idle duration within [CC6, PC6) would make the
idle governor choose C1E over C6. This would cause low energy-efficiency.
We should lower the bar to request C6 when PC6 is disabled.

To fill this gap, check if PC6 is disabled in the BIOS in the
MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
for C6 and set target_residency to 3 times of the new exit latency. [This
is consistent with how intel_idle driver uses _CST to calculate the
target_residency.] As a result, the OS would be more likely to choose C6
over C1E when PC6 is disabled, which is reasonable, because if C6 is
enabled, it implies that the user cares about energy, so choosing C6 more
frequently makes sense.

The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
model number as SkX, but their CC6 exit latencies are similar to the SKX
one, 96us and 89us respectively, so reuse the SKX value for them.

There is a concern that it might be better to use a more generic approach
instead of optimizing every platform. However, if the required code
complexity and different PC6 bit interpretation on different platforms
are taken into account, tuning the code per platform seems to be an
acceptable tradeoff.

Link: https://intel.github.io/wult/ # [1]
Suggested-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
Reviewed-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64233338

09 4月, 2021 1 次提交

intel_idle: add Iclelake-D support · 22141d5f

由 Artem Bityutskiy 提交于 4月 07, 2021

This patch adds Icelake Xeon D support to the intel_idle driver.

Since Icelake D and Icelake SP C-state characteristics the same,
we use Icelake SP C-states table for Icelake D as well.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: NChen Yu <yu.c.chen@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

22141d5f

19 3月, 2021 1 次提交

intel_idle: update ICX C6 data · d484b8bf

由 Artem Bityutskiy 提交于 3月 08, 2021

Change IceLake Xeon C6 latency from 128 us to 170 us. The latency
was measured with the "wult" tool and corresponds to the 99.99th
percentile when measuring with the "nic" method. Note, the 128 us
figure correspond to the median latency, but in intel_idle we use
the "worst case" latency figure instead.

C6 target residency was increased from 384 us to 600 us, which may
result in less C6 residency in some workloads. This value was tested
and compared to values 384, and 1000. Value 600 is a reasonable
tradeoff between power and performance.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d484b8bf

22 1月, 2021 1 次提交

intel_idle: remove definition of DEBUG · 651bc581

由 Tom Rix 提交于 1月 15, 2021

Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: NTom Rix <trix@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

651bc581

31 12月, 2020 1 次提交

intel_idle: add SnowRidge C-state table · 9cf93f05

由 Artem Bityutskiy 提交于 12月 27, 2020

Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.

The following has been changed.

 1. C1E latency changed from 10us to 15us. It was measured using the
    open source "wult" tool (the "nic" method, 15us is the 99.99th
    percentile).

 2. C1E power break even changed from 20us to 25us, which may result
    in less C1E residency in some workloads.

 3. C6 latency changed from 50us to 130us. Measured the same way as C1E.

The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.

On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9cf93f05

03 12月, 2020 1 次提交

intel_idle: Build fix · 4d916140

由 Peter Zijlstra 提交于 11月 30, 2020

Because CONFIG_ soup.

Fixes: 6e1d2bc6 ("intel_idle: Fix intel_idle() vs tracing")
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201130115402.GO3040@hirez.programming.kicks-ass.net

4d916140

24 11月, 2020 1 次提交

intel_idle: Fix intel_idle() vs tracing · 6e1d2bc6

由 Peter Zijlstra 提交于 11月 20, 2020

cpuidle->enter() callbacks should not call into tracing because RCU
has already been disabled. Instead of doing the broadcast thing
itself, simply advertise to the cpuidle core that those states stop
the timer.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20201123143510.GR3021@hirez.programming.kicks-ass.net

6e1d2bc6

28 10月, 2020 1 次提交

intel_idle: Fix max_cstate for processor models without C-state tables · 4e0ba557

由 Chen Yu 提交于 10月 25, 2020

Currently intel_idle driver gets the c-state information from ACPI
_CST if the processor model is not recognized by it. However the
c-state in _CST starts with index 1 which is different from the
index in intel_idle driver's internal c-state table.

While intel_idle_max_cstate_reached() was previously introduced to
deal with intel_idle driver's internal c-state table, re-using
this function directly on _CST is incorrect.

Fix this by subtracting 1 from the index when checking max_cstate
in the _CST case.

For example, append intel_idle.max_cstate=1 in boot command line,
Before the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
POLL
After the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI

Fixes: 18734958 ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: NPengfei Xu <pengfei.xu@intel.com>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4e0ba557

16 10月, 2020 2 次提交

intel_idle: Ignore _CST if control cannot be taken from the platform · 75af76d0

由 Mel Gorman 提交于 10月 16, 2020

e6d4f08a ("intel_idle: Use ACPI _CST on server systems") avoids
enabling c-states that have been disabled by the platform with the
exception of C1E.

Unfortunately, BIOS implementations are not always consistent in terms
of how capabilities are advertised and control cannot always be handed
over. If control cannot be handed over then intel_idle reports that "ACPI
_CST not found or not usable" but does not clear acpi_state_table.count
meaning the information is still partially used.

This patch ignores ACPI information if CST control cannot be requested from
the platform. This was only observed on a number of Haswell platforms that
had identical CPUs but not identical BIOS versions. While this problem
may be rare overall, 24 separate test cases bisected to this specific
commit across 4 separate test machines and is worth addressing. If the
situation occurs, the kernel behaves as it did before commit e6d4f08a
and uses any c-states that are discovered.

The affected test cases were all ones that involved a small number of
processes -- exec microbenchmark, pipe microbenchmark, git test suite,
netperf, tbench with one client and system call microbenchmark. Each
case benefits from being able to use turboboost which is prevented if the
lower c-states are unavailable. This may mask real regressions specific
to older hardware so it is worth addressing.

C-state status before and after the patch

5.9.0-vanilla POLL latency:0 disabled:0 default:enabled
5.9.0-vanilla C1 latency:2 disabled:0 default:enabled
5.9.0-vanilla C1E latency:10 disabled:0 default:enabled
5.9.0-vanilla C3 latency:33 disabled:1 default:disabled
5.9.0-vanilla C6 latency:133 disabled:1 default:disabled
5.9.0-ignore-cst-v1r1 POLL latency:0 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1 latency:2 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C1E latency:10 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C3 latency:33 disabled:0 default:enabled
5.9.0-ignore-cst-v1r1 C6 latency:133 disabled:0 default:enabled

Patch enables C3/C6.

Netperf UDP_STREAM

netperf-udp
5.5.0 5.9.0
vanilla ignore-cst-v1r1
Hmean send-64 193.41 ( 0.00%) 226.54 * 17.13%*
Hmean send-128 392.16 ( 0.00%) 450.54 * 14.89%*
Hmean send-256 769.94 ( 0.00%) 881.85 * 14.53%*
Hmean send-1024 2994.21 ( 0.00%) 3468.95 * 15.85%*
Hmean send-2048 5725.60 ( 0.00%) 6628.99 * 15.78%*
Hmean send-3312 8468.36 ( 0.00%) 10288.02 * 21.49%*
Hmean send-4096 10135.46 ( 0.00%) 12387.57 * 22.22%*
Hmean send-8192 17142.07 ( 0.00%) 19748.11 * 15.20%*
Hmean send-16384 28539.71 ( 0.00%) 30084.45 * 5.41%*

Fixes: e6d4f08a ("intel_idle: Use ACPI _CST on server systems")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

75af76d0

intel_idle: mention assumption that WBINVD is not needed · 8bb2e2a8

由 Alexander Monakov 提交于 10月 12, 2020

Intel SDM does not explicitly say that entering a C-state via MWAIT will
implicitly flush CPU caches as appropriate for that C-state. However,
documentation for individual Intel CPU generations does mention this
behavior.

Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
of MWAIT behavior.

In passing, reword opening comment to make it clear that the driver can
load on any old and future Intel CPU with MWAIT.
Signed-off-by: NAlexander Monakov <amonakov@ispras.ru>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8bb2e2a8

26 8月, 2020 1 次提交

cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic · bf9282dc

由 Peter Zijlstra 提交于 8月 12, 2020

This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMarco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org

bf9282dc

30 7月, 2020 2 次提交

intel_idle: Customize IceLake server support · a472ad2b

由 Chen Yu 提交于 7月 10, 2020

On ICX platform, the C1E auto-promotion is enabled by default.
As a result, the CPU might fall into C1E more offen than previous
platforms. Besides, the C1E is not exposed to sysfs on ICX, which
is inconsistent with previous server platforms.

So disable C1E auto-promotion and expose C1E as a separate idle
state, so the C1E and C6 can be disabled via sysfs when necessary.

Beside C1 and C1E, the exit latency of C6 was measured
by a dedicated tool. However the exit latency(41us) exposed
by _CST is much smaller than the one we measured(128us). This
is probably due to the _CST uses the exit latency when woken
up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose
the latter as we need the longest latency in theory.
Reported-by: Nkernel test robot <lkp@intel.com>
Tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a472ad2b

cpuidle: change enter_s2idle() prototype · efe97112

由 Neal Liu 提交于 7月 27, 2020

Control Flow Integrity(CFI) is a security mechanism that disallows
changes to the original control flow graph of a compiled binary,
making it significantly harder to perform such attacks.

init_state_node() assign same function callback to different
function pointer declarations.

static int init_state_node(struct cpuidle_state *idle_state,
                           const struct of_device_id *matches,
                           struct device_node *state_node) { ...
        idle_state->enter = match_id->data; ...
        idle_state->enter_s2idle = match_id->data; }

Function declarations:

struct cpuidle_state { ...
        int (*enter) (struct cpuidle_device *dev,
                      struct cpuidle_driver *drv,
                      int index);

        void (*enter_s2idle) (struct cpuidle_device *dev,
                              struct cpuidle_driver *drv,
                              int index); };

In this case, either enter() or enter_s2idle() would cause CFI check
failed since they use same callee.

Align function prototype of enter() since it needs return value for
some use cases. The return value of enter_s2idle() is no
need currently.
Signed-off-by: NNeal Liu <neal.liu@mediatek.com>
Reviewed-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

efe97112

17 7月, 2020 1 次提交

treewide: Remove uninitialized_var() usage · 3f649ab7

由 Kees Cook 提交于 6月 03, 2020

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: NKees Cook <keescook@chromium.org>

3f649ab7

29 6月, 2020 1 次提交

intel_idle: Eliminate redundant static variable · dab20177

由 Rafael J. Wysocki 提交于 6月 29, 2020

The value of the lapic_timer_always_reliable static variable in
the intel_idle driver reflects the boot_cpu_has(X86_FEATURE_ARAT)
value and so it also reflects the static_cpu_has(X86_FEATURE_ARAT)
value.

Hence, the lapic_timer_always_reliable check in intel_idle() is
redundant and apart from this lapic_timer_always_reliable is only
used in two places in which boot_cpu_has(X86_FEATURE_ARAT) can be
used directly.

Eliminate the lapic_timer_always_reliable variable in accordance
with the above observations.

No intentional functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dab20177

25 3月, 2020 1 次提交

intel_idle: Convert to new X86 CPU match macros · 4a9f45a0

由 Thomas Gleixner 提交于 3月 20, 2020

The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Get rid the of the local macro wrappers for consistency.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.193755545@linutronix.de

4a9f45a0

12 2月, 2020 7 次提交

intel_idle: Update copyright notice, known limitations and version · 317e5ec3

由 Rafael J. Wysocki 提交于 2月 06, 2020

Update the copyright notice in intel_idle.c to cover the recent
changes, drop the description of a "known limitation" that is not
a limitation any more and bump up the driver version number.

No functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

317e5ec3

intel_idle: Define CPUIDLE_FLAG_TLB_FLUSHED as BIT(16) · a472e4b5

由 Rafael J. Wysocki 提交于 2月 06, 2020

Use the BIT() macro for defining CPUIDLE_FLAG_TLB_FLUSHED instead of
the hex bit encoding of the same value.

No intentional functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a472e4b5

intel_idle: Clean up kerneldoc comments for multiple functions · 6eacb15f

由 Rafael J. Wysocki 提交于 2月 06, 2020

Turn the description comments of some functions in the intel_idle
driver into proper kerneldoc ones and clean them up.

No functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6eacb15f

intel_idle: Reorder declarations of static variables · 6eb0443a

由 Rafael J. Wysocki 提交于 2月 06, 2020

Reorder declarations of static variables so that the __initdata ones
are declared together.

No functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6eb0443a

intel_idle: Annotate init time data structures · ab1a8522

由 Rafael J. Wysocki 提交于 2月 06, 2020

Add __initdata or __initconst annotations to the static data
structures that are only used during the initialization of the
driver.

No intentional functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ab1a8522

intel_idle: Add __initdata annotations to init time variables · 7f843dd7

由 Rafael J. Wysocki 提交于 2月 06, 2020

Annotate static variables cpuidle_state_table and mwait_substates
with __initdata, because they are only used during the initialization
of the driver.

Also notice that static variable icpu could be annotated analogously
and the structure pointed to by it could be __initconst, but two of
its fields are accessed via icpu in intel_idle_cpu_init() and
auto_demotion_disable(), so introduce two new static variables,
auto_demotion_disable_flags and disable_promotion_to_c1e, to hold
the values of these fields, set them during the initialization and
use them in those functions instead of accessing the source data
structure via icpu.

That allows icpu to be annotated with __initdata, so do that,
and it will also allow some __initconst annotations to be added
subsequently.

No intentional functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7f843dd7

intel_idle: Relocate definitions of cpuidle callbacks · 30a996fb

由 Rafael J. Wysocki 提交于 2月 06, 2020

Move the definitions of intel_idle() and intel_idle_s2idle() before
the definitions of cpuidle_state structures referring to them to
avoid having to use additional declarations of them (and drop those
declarations).

No functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

30a996fb

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功