提交 · 3f601608b71c3ca1e199898cd16f09d707fedb56 · openeuler / Kernel

05 7月, 2021 1 次提交

powerpc/xive: Fix error handling when allocating an IPI · 3f601608

由 Cédric Le Goater 提交于 7月 01, 2021

This is a smatch warning:

  arch/powerpc/sysdev/xive/common.c:1161 xive_request_ipi() warn: unsigned 'xid->irq' is never less than zero.

Fixes: fd6db289 ("powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler")
Cc: stable@vger.kernel.org # v5.13
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701152412.1507612-1-clg@kaod.org

3f601608

01 5月, 2021 1 次提交

powerpc/xive: remove unnecessary unmap_kernel_range · 94f88d7b

由 Nicholas Piggin 提交于 4月 29, 2021

iounmap will remove ptes.

Link: https://lkml.kernel.org/r/20210322021806.892164-4-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NCédric Le Goater <clg@kaod.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94f88d7b

17 4月, 2021 1 次提交

powerpc/xive: Use the "ibm, chip-id" property only under PowerNV · e9e16917

由 Cédric Le Goater 提交于 4月 13, 2021

The 'chip_id' field of the XIVE CPU structure is used to choose a
target for a source located on the same chip. For that, the XIVE
driver queries the chip identifier from the "ibm,chip-id" property
and compares it to a 'src_chip' field identifying the chip of a
source. This information is only available on the PowerNV platform,
'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries.

The "ibm,chip-id" property is also not available on all platforms. It
was first introduced on PowerNV and later, under QEMU for pSeries/KVM.
However, the property is not part of PAPR and does not exist under
pSeries/PowerVM.

Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the
PowerNV platform override the value with the "ibm,chip-id" property.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org

e9e16917

14 4月, 2021 8 次提交

powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler · fd6db289

由 Cédric Le Goater 提交于 3月 31, 2021

Instead of calling irq_create_mapping() to map the IPI for a node,
introduce an 'alloc' handler. This is usually an extension to support
hierarchy irq_domains which is not exactly the case for XIVE-IPI
domain. However, we can now use the irq_domain_alloc_irqs() routine
which allocates the IRQ descriptor on the specified node, even better
for cache performance on multi node machines.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-10-clg@kaod.org

fd6db289

powerpc/xive: Map one IPI interrupt per node · 7dcc37b3

由 Cédric Le Goater 提交于 3月 31, 2021

ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.

This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.

As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.

As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.

 * P9 DD2.2 - 2s * 64 threads

                                               "noirqdebug"
                        Mint/s                    Mint/s
 chips  cpus      IPI/sys   IPI/chip       IPI/chip    IPI/sys
 --------------------------------------------------------------
 1      0-15     4.984023   4.875405       4.996536   5.048892
        0-31    10.879164  10.544040      10.757632  11.037859
        0-47    15.345301  14.688764      14.926520  15.310053
        0-63    17.064907  17.066812      17.613416  17.874511
 2      0-79    11.768764  21.650749      22.689120  22.566508
        0-95    10.616812  26.878789      28.434703  28.320324
        0-111   10.151693  31.397803      31.771773  32.388122
        0-127    9.948502  33.139336      34.875716  35.224548

 * P10 DD1 - 4s (not homogeneous) 352 threads

                                               "noirqdebug"
                        Mint/s                    Mint/s
 chips  cpus      IPI/sys   IPI/chip       IPI/chip    IPI/sys
 --------------------------------------------------------------
 1      0-15     2.409402   2.364108       2.383303   2.395091
        0-31     6.028325   6.046075       6.089999   6.073750
        0-47     8.655178   8.644531       8.712830   8.724702
        0-63    11.629652  11.735953      12.088203  12.055979
        0-79    14.392321  14.729959      14.986701  14.973073
        0-95    12.604158  13.004034      17.528748  17.568095
 2      0-111    9.767753  13.719831      19.968606  20.024218
        0-127    6.744566  16.418854      22.898066  22.995110
        0-143    6.005699  19.174421      25.425622  25.417541
        0-159    5.649719  21.938836      27.952662  28.059603
        0-175    5.441410  24.109484      31.133915  31.127996
 3      0-191    5.318341  24.405322      33.999221  33.775354
        0-207    5.191382  26.449769      36.050161  35.867307
        0-223    5.102790  29.356943      39.544135  39.508169
        0-239    5.035295  31.933051      42.135075  42.071975
        0-255    4.969209  34.477367      44.655395  44.757074
 4      0-271    4.907652  35.887016      47.080545  47.318537
        0-287    4.839581  38.076137      50.464307  50.636219
        0-303    4.786031  40.881319      53.478684  53.310759
        0-319    4.743750  43.448424      56.388102  55.973969
        0-335    4.709936  45.623532      59.400930  58.926857
        0-351    4.681413  45.646151      62.035804  61.830057

[*] https://github.com/antonblanchard/ipistormSigned-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-9-clg@kaod.org

7dcc37b3

powerpc/xive: Fix xmon command "dxi" · 33e4bc59

由 Cédric Le Goater 提交于 3月 31, 2021

When under xmon, the "dxi" command dumps the state of the XIVE
interrupts. If an interrupt number is specified, only the state of
the associated XIVE interrupt is dumped. This form of the command
lacks an irq_data parameter which is nevertheless used by
xmon_xive_get_irq_config(), leading to an xmon crash.

Fix that by doing a lookup in the system IRQ mapping to query the IRQ
descriptor data. Invalid interrupt numbers, or not belonging to the
XIVE IRQ domain, OPAL event interrupt number for instance, should be
caught by the previous query done at the firmware level.

Fixes: 97ef2750 ("powerpc/xive: Fix xmon support on the PowerNV platform")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Tested-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-8-clg@kaod.org

33e4bc59

powerpc/xive: Simplify the dump of XIVE interrupts under xmon · 6bf66eb8

由 Cédric Le Goater 提交于 3月 31, 2021

Move the xmon routine under XIVE subsystem and rework the loop on the
interrupts taking into account the xive_irq_domain to filter out IPIs.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-7-clg@kaod.org

6bf66eb8

powerpc/xive: Drop check on irq_data in xive_core_debug_show() · a74ce592

由 Cédric Le Goater 提交于 3月 31, 2021

When looping on IRQ descriptor, irq_data is always valid.

Fixes: 930914b7 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-6-clg@kaod.org

a74ce592

powerpc/xive: Simplify xive_core_debug_show() · 5159d987

由 Cédric Le Goater 提交于 3月 31, 2021

Now that the IPI interrupt has its own domain, the checks on the HW
interrupt number XIVE_IPI_HW_IRQ and on the chip can be replaced by a
check on the domain.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-5-clg@kaod.org

5159d987

powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ · 1835e729

由 Cédric Le Goater 提交于 3月 31, 2021

The IPI interrupt has its own domain now. Testing the HW interrupt
number is not needed anymore.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-4-clg@kaod.org

1835e729

powerpc/xive: Introduce an IPI interrupt domain · 7d348494

由 Cédric Le Goater 提交于 3月 31, 2021

The IPI interrupt is a special case of the XIVE IRQ domain. When
mapping and unmapping the interrupts in the Linux interrupt number
space, the HW interrupt number 0 (XIVE_IPI_HW_IRQ) is checked to
distinguish the IPI interrupt from other interrupts of the system.

Simplify the XIVE interrupt domain by introducing a specific domain
for the IPI.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-3-clg@kaod.org

7d348494

11 12月, 2020 8 次提交

powerpc/xive: Simplify xive_do_source_eoi() · 614546d5

由 Cédric Le Goater 提交于 12月 10, 2020

Previous patches removed the need of the first argument which was a
hack for Firwmware EOI. Remove it and flatten the routine which has
became simpler.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-12-clg@kaod.org

614546d5

powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW · cf58b746

由 Cédric Le Goater 提交于 12月 10, 2020

This flag was used to support the P9 DD1 and we have stopped
supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Also, remove eoi handler which is now unused.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org

cf58b746

powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW · b5277d18

由 Cédric Le Goater 提交于 12月 10, 2020

This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

https://github.com/open-power/skiboot/commit/0b0d15e3c170Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org

b5277d18

powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG · 4cc0e36d

由 Cédric Le Goater 提交于 12月 10, 2020

This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

4cc0e36d

powerpc/xive: Add a debug_show handler to the XIVE irq_domain · a5021abc

由 Cédric Le Goater 提交于 12月 10, 2020

Full state of the Linux interrupt descriptors can be dumped under
debugfs when compiled with CONFIG_GENERIC_IRQ_DEBUGFS. Add support for
the XIVE interrupt controller.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-7-clg@kaod.org

a5021abc

powerpc/xive: Add a name to the IRQ domain · 9dfe4b14

由 Cédric Le Goater 提交于 12月 10, 2020

We hope one day to handle multiple irq_domain in the XIVE driver.
Start simple by setting the name using the DT node.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-6-clg@kaod.org

9dfe4b14

powerpc/xive: Introduce XIVE_IPI_HW_IRQ · e2cf43d5

由 Cédric Le Goater 提交于 12月 10, 2020

The XIVE driver deals with CPU IPIs in a peculiar way. Each CPU has
its own XIVE IPI interrupt allocated at the HW level, for PowerNV, or
at the hypervisor level for pSeries. In practice, these interrupts are
not always used. pSeries/PowerVM prefers local doorbells for local
threads since they are faster. On PowerNV, global doorbells are also
preferred for the same reason.

The mapping in the Linux is reduced to a single interrupt using HW
interrupt number 0 and a custom irq_chip to handle EOI. This can cause
performance issues in some benchmark (ipistorm) on multichip systems.

Clarify the use of the 0 value, it will help in improving multichip
support.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-4-clg@kaod.org

e2cf43d5

powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag · 4f1c3f7b

由 Cédric Le Goater 提交于 12月 10, 2020

This is a simple cleanup to identify easily all flags of the XIVE
interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI
are the escalations used to wake up vCPUs in KVM. They are handled
very differently from the rest.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org

4f1c3f7b

18 9月, 2020 1 次提交

powerpc/xive: Make debug routines static · 2228f19c

由 Cédric Le Goater 提交于 9月 14, 2020

This fixes a compile error with W=1.

CC      arch/powerpc/sysdev/xive/common.o
../arch/powerpc/sysdev/xive/common.c:1568:6: error: no previous prototype for ‘xive_debug_show_cpu’ [-Werror=missing-prototypes]
 void xive_debug_show_cpu(struct seq_file *m, int cpu)
      ^~~~~~~~~~~~~~~~~~~
../arch/powerpc/sysdev/xive/common.c:1602:6: error: no previous prototype for ‘xive_debug_show_irq’ [-Werror=missing-prototypes]
 void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
      ^~~~~~~~~~~~~~~~~~~

Fixes: 930914b7 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914211007.2285999-5-clg@kaod.org

2228f19c

28 5月, 2020 1 次提交

powerpc/xive: Do not expose a debugfs file when XIVE is disabled · 0755e855

由 Cédric Le Goater 提交于 4月 29, 2020

The XIVE interrupt mode can be disabled with the "xive=off" kernel
parameter, in which case there is nothing to present to the user in the
associated /sys/kernel/debug/powerpc/xive file.

Fixes: 930914b7 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200429075122.1216388-4-clg@kaod.org

0755e855

26 5月, 2020 1 次提交

powerpc/xive: Clear the page tables for the ESB IO mapping · a101950f

由 Cédric Le Goater 提交于 4月 29, 2020

Commit 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the
machine crash handler") fixed an issue in the FW assisted dump of
machines using hash MMU and the XIVE interrupt mode under the POWER
hypervisor. It forced the mapping of the ESB page of interrupts being
mapped in the Linux IRQ number space to make sure the 'crash kexec'
sequence worked during such an event. But it didn't handle the
un-mapping.

This mapping is now blocking the removal of a passthrough IO adapter
under the POWER hypervisor because it expects the guest OS to have
cleared all page table entries related to the adapter. If some are
still present, the RTAS call which isolates the PCI slot returns error
9001 "valid outstanding translations".

Remove these mapping in the IRQ data cleanup routine.

Under KVM, this cleanup is not required because the ESB pages for the
adapter interrupts are un-mapped from the guest by the hypervisor in
the KVM XIVE native device. This is now redundant but it's harmless.

Fixes: 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the machine crash handler")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org

a101950f

07 5月, 2020 1 次提交

powerpc/xive: Enforce load-after-store ordering when StoreEOI is active · b1f9be93

由 Cédric Le Goater 提交于 2月 20, 2020

When an interrupt has been handled, the OS notifies the interrupt
controller with a EOI sequence. On a POWER9 system using the XIVE
interrupt controller, this can be done with a load or a store
operation on the ESB interrupt management page of the interrupt. The
StoreEOI operation has less latency and improves interrupt handling
performance but it was deactivated during the POWER9 DD2.0 timeframe
because of ordering issues. We use the LoadEOI today but we plan to
reactivate StoreEOI in future architectures.

There is usually no need to enforce ordering between ESB load and
store operations as they should lead to the same result. E.g. a store
trigger and a load EOI can be executed in any order. Assuming the
interrupt state is PQ=10, a store trigger followed by a load EOI will
return a Q bit. In the reverse order, it will create a new interrupt
trigger from HW. In both cases, the handler processing interrupts is
notified.

In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
disable temporarily the interrupt source (mask/unmask). When the
source is reenabled, the OS can detect if interrupts were received
while the source was disabled and reinject them. This process needs
special care when StoreEOI is activated. The ESB load and store
operations should be correctly ordered because a XIVE_ESB_STORE_EOI
operation could leave the source enabled if it has not completed
before the loads.

For those cases, we enforce Load-after-Store ordering with a special
load operation offset. To avoid performance impact, this ordering is
only enforced when really needed, that is when interrupt sources are
temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
be needed for other loads.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org

b1f9be93

26 3月, 2020 4 次提交

powerpc/xive: Add a debugfs file to dump internal XIVE state · 930914b7

由 Cédric Le Goater 提交于 3月 06, 2020

As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes
the XIVE internal state of the machine CPUs and interrupts. Available
on the PowerNV and sPAPR platforms.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
[mpe: Make the debugfs file 0400]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org

930914b7

powerpc/xmon: Add source flags to output of XIVE interrupts · 5191e0ba

由 Cédric Le Goater 提交于 3月 06, 2020

Some firmwares or hypervisors can advertise different source
characteristics. Track their value under XMON. What we are mostly
interested in is the StoreEOI flag.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org

5191e0ba

powerpc/xive: Fix xmon support on the PowerNV platform · 97ef2750

由 Cédric Le Goater 提交于 3月 06, 2020

The PowerNV platform has multiple IRQ chips and the xmon command
dumping the state of the XIVE interrupt should only operate on the
XIVE IRQ chip.

Fixes: 5896163f ("powerpc/xmon: Improve output of XIVE interrupts")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org

97ef2750

powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs · b1a504a6

由 Cédric Le Goater 提交于 3月 06, 2020

When a CPU is brought up, an IPI number is allocated and recorded
under the XIVE CPU structure. Invalid IPI numbers are tracked with
interrupt number 0x0.

On the PowerNV platform, the interrupt number space starts at 0x10 and
this works fine. However, on the sPAPR platform, it is possible to
allocate the interrupt number 0x0 and this raises an issue when CPU 0
is unplugged. The XIVE spapr driver tracks allocated interrupt numbers
in a bitmask and it is not correctly updated when interrupt number 0x0
is freed. It stays allocated and it is then impossible to reallocate.

Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms.
Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org

b1a504a6

22 1月, 2020 1 次提交

powerpc/xive: Discard ESB load value when interrupt is invalid · 17328f21

由 Frederic Barrat 提交于 1月 13, 2020

A load on an ESB page returning all 1's means that the underlying
device has invalidated the access to the PQ state of the interrupt
through mmio. It may happen, for example when querying a PHB interrupt
while the PHB is in an error state.

In that case, we should consider the interrupt to be invalid when
checking its state in the irq_get_irqchip_state() handler.

Fixes: da15c03b ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NFrederic Barrat <fbarrat@linux.ibm.com>
[clg: wrote a commit log, introduced XIVE_ESB_INVALID ]
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org

17328f21

13 11月, 2019 1 次提交

powerpc/xive: Prevent page fault issues in the machine crash handler · 1ca3dec2

由 Cédric Le Goater 提交于 10月 31, 2019

When the machine crash handler is invoked, all interrupts are masked
but interrupts which have not been started yet do not have an ESB page
mapped in the Linux address space. This crashes the 'crash kexec'
sequence on sPAPR guests.

To fix, force the mapping of the ESB page when an interrupt is being
mapped in the Linux IRQ number space. This is done by setting the
initial state of the interrupt to OFF which is not necessarily the
case on PowerNV.

Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org

1ca3dec2

13 9月, 2019 2 次提交

powerpc/xmon: Fix output of XIVE IPI · 855d9140

由 Cédric Le Goater 提交于 9月 10, 2019

When dumping the XIVE state of an CPU IPI, xmon does not check if the
CPU is started or not which can cause an error. Add a check for that
and change the output to be on one line just as the XIVE interrupts of
the machine.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910081850.26038-3-clg@kaod.org

855d9140

powerpc/xmon: Improve output of XIVE interrupts · 5896163f

由 Cédric Le Goater 提交于 9月 10, 2019

When looping on the list of interrupts, add the current value of the
PQ bits with a load on the ESB page. This has the side effect of
faulting the ESB page of all interrupts.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910081850.26038-2-clg@kaod.org

5896163f

19 8月, 2019 1 次提交

powerpc/xive: Fix dump of XIVE interrupt under pseries · b4868ff5

由 Cédric Le Goater 提交于 8月 14, 2019

The xmon 'dxi' command calls OPAL to query the XIVE configuration of a
interrupt. This can only be done on baremetal (PowerNV) and it will
crash a pseries machine.

Introduce a new XIVE get_irq_config() operation which implements a
different query depending on the platform, PowerNV or pseries, and
modify xmon to use a top level wrapper.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190814154754.23682-3-clg@kaod.org

b4868ff5

16 8月, 2019 1 次提交

powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race · da15c03b

由 Paul Mackerras 提交于 8月 13, 2019

Testing has revealed the existence of a race condition where a XIVE
interrupt being shut down can be in one of the XIVE interrupt queues
(of which there are up to 8 per CPU, one for each priority) at the
point where free_irq() is called.  If this happens, can return an
interrupt number which has been shut down.  This can lead to various
symptoms:

- irq_to_desc(irq) can be NULL.  In this case, no end-of-interrupt
  function gets called, resulting in the CPU's elevated interrupt
  priority (numerically lowered CPPR) never gets reset.  That then
  means that the CPU stops processing interrupts, causing device
  timeouts and other errors in various device drivers.

- The irq descriptor or related data structures can be in the process
  of being freed as the interrupt code is using them.  This typically
  leads to crashes due to bad pointer dereferences.

This race is basically what commit 62e04686 ("genirq: Add optional
hardware synchronization for shutdown", 2019-06-28) is intended to
fix, given a get_irqchip_state() method for the interrupt controller
being used.  It works by polling the interrupt controller when an
interrupt is being freed until the controller says it is not pending.

With XIVE, the PQ bits of the interrupt source indicate the state of
the interrupt source, and in particular the P bit goes from 0 to 1 at
the point where the hardware writes an entry into the interrupt queue
that this interrupt is directed towards.  Normally, the code will then
process the interrupt and do an end-of-interrupt (EOI) operation which
will reset PQ to 00 (assuming another interrupt hasn't been generated
in the meantime).  However, there are situations where the code resets
P even though a queue entry exists (for example, by setting PQ to 01,
which disables the interrupt source), and also situations where the
code leaves P at 1 after removing the queue entry (for example, this
is done for escalation interrupts so they cannot fire again until
they are explicitly re-enabled).

The code already has a 'saved_p' flag for the interrupt source which
indicates that a queue entry exists, although it isn't maintained
consistently.  This patch adds a 'stale_p' flag to indicate that
P has been left at 1 after processing a queue entry, and adds code
to set and clear saved_p and stale_p as necessary to maintain a
consistent indication of whether a queue entry may or may not exist.

With this, we can implement xive_get_irqchip_state() by looking at
stale_p, saved_p and the ESB PQ bits for the interrupt.

There is some additional code to handle escalation interrupts
properly; because they are enabled and disabled in KVM assembly code,
which does not have access to the xive_irq_data struct for the
escalation interrupt.  Hence, stale_p may be incorrect when the
escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
with some careful attention to barriers in order to ensure the correct
result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().

Finally, this adds code to make noise on the console (pr_crit and
WARN_ON(1)) if we find an interrupt queue entry for an interrupt
which does not have a descriptor.  While this won't catch the race
reliably, if it does get triggered it will be an indication that
the race is occurring and needs to be debugged.

Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry

da15c03b

05 8月, 2019 1 次提交

powerpc/xive: Update comment referencing magic loads from an ESB · 1ebe0dcc

由 Jordan Niethe 提交于 8月 02, 2019

The comment above xive_esb_read() references magic loads from an ESB as
described xive.h. This has been inaccurate since commit 12c1f339
("powerpc/xive: Move definition of ESB bits") which moved the
description. Update the comment to reference the new location of the
description in xive-regs.h
Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
Acked-by: NStewart Smith <stewart@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802000835.26191-1-jniethe5@gmail.com

1ebe0dcc

18 7月, 2019 1 次提交

powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · 4d202c8c

由 Gautham R. Shenoy 提交于 7月 17, 2019

xive_find_target_in_mask() has the following for(;;) loop which has a
bug when @first == cpumask_first(@mask) and condition 1 fails to hold
for every CPU in @mask. In this case we loop forever in the for-loop.

  first = cpu;
  for (;;) {
  	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
		  return cpu;
	  cpu = cpumask_next(cpu, mask);
	  if (cpu == first) // condition 2
		  break;

	  if (cpu >= nr_cpu_ids) // condition 3
		  cpu = cpumask_first(mask);
  }

This is because, when @first == cpumask_first(@mask), we never hit the
condition 2 (cpu == first) since prior to this check, we would have
executed "cpu = cpumask_next(cpu, mask)" which will set the value of
@cpu to a value greater than @first or to nr_cpus_ids. When this is
coupled with the fact that condition 1 is not met, we will never exit
this loop.

This was discovered by the hard-lockup detector while running LTP test
concurrently with SMT switch tests.

 watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
 watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
 watchdog: CPU 68 Hard LOCKUP
 watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
 CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
 NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
 REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
 MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
 CFAR: c0000000006f558c IRQMASK: 1
 GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
 GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
 GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
 GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
 GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
 GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
 GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
 GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
 NIP [c0000000006f5578] find_next_bit+0x38/0x90
 LR [c000000000cba9ec] cpumask_next+0x2c/0x50
 Call Trace:
 [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
 [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
 [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
 [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
 [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
 [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
 [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
 [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
 [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
 [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
 [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
 [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
 [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
 [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
 [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
 [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
 [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
 Instruction dump:
 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040

To fix this, move the check for condition 2 after the check for
condition 3, so that we are able to break out of the loop soon after
iterating through all the CPUs in the @mask in the problem case. Use
do..while() to achieve this.

Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com

4d202c8c

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2874c5fd

15 1月, 2019 1 次提交

powerpc: remove unnecessary unlikely() · 63da6cae

由 Igor Stoppa 提交于 9月 07, 2018

WARN_ON() already contains an unlikely(), so it's not necessary to
wrap it into another.
Signed-off-by: NIgor Stoppa <igor.stoppa@huawei.com>
Cc: Arseny Solokha <asolokha@kb.kras.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

63da6cae

25 11月, 2018 1 次提交

powerpc/xive: Define xive_do_source_eoi as static · 92a45432

由 Breno Leitao 提交于 10月 22, 2018

Sparse shows that xive_do_source_eoi() file is defined without any
declaration, thus, it should be a static function.

arch/powerpc/sysdev/xive/common.c:312:6: warning: symbol 'xive_do_source_eoi' was not declared. Should it be static?

This patch simply turns this symbol into static.
Signed-off-by: NBreno Leitao <leitao@debian.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

92a45432

03 10月, 2018 1 次提交

powerpc/xive: Move a dereference below a NULL test · cd5ff945

由 zhong jiang 提交于 9月 26, 2018

Move the dereference of xc below the NULL test.
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cd5ff945

24 8月, 2018 1 次提交

treewide: correct "differenciate" and "instanciate" typos · 3cc97bea

由 Finn Thain 提交于 8月 23, 2018

Also add these typos to spelling.txt so checkpatch.pl will look for them.

Link: http://lkml.kernel.org/r/88af06b9de34d870cb0afc46cfd24e0458be2575.1529471371.git.fthain@telegraphics.com.auSigned-off-by: NFinn Thain <fthain@telegraphics.com.au>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cc97bea

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功